Kevin Feasel – Page 288

The Importance of Data Retention Policies

Published 2024-01-22 by Kevin Feasel

It is always an afterthought. New objects are created that start off small and current. New feature development takes over and the recently architected data structures become old news. Over time, data grows and suddenly a previously small table contains millions or billions of rows.

Is all that data necessary? How long should it be retained for? If there is no answer to this question, then the actuality may be “Forever”, or more honestly “No one knows for sure.”

Retention takes on many forms and this article dives into ways in which data can be managed over time to ensure that it is fast, accurate, and readily available.

We don’t tend to think about data retention in the development phase, but it’s an important consideration and thinking about it up-front might save you disk space headaches later.

Comments closed

Kafka Resources for Various Programming Languages

Published 2024-01-22 by Kevin Feasel

Erin Junio shares some links:

New to running Kafka applications and looking for some tutorials to get you started with a specific language? Check out these introductory resources below, and get hands-on with programming languages for Apache Kafka while building your first Kafka app.

Read on for introductory tutorials and more advanced guides in more than a half-dozen languages.

Comments closed

Tips for Saving Money in the Cloud

Published 2024-01-22 by Kevin Feasel

David Klee offers up some advice:

You might be able to shave off thousands – or more – in your monthly cloud bills for your critical SQL Servers, all while maintaining or even improving performance.

Public cloud providers charge organizations for everything they deploy, and while a few items in the cloud are based on a pay-by-consumption model, most of the services that really stack up, namely SQL Server licensing, are charged on a pay-by-allocation model. If you provision a SQL Server VM with eight cores and promptly forget about it, you will be shocked at the end of the month when you receive your cloud subscription bill. If you provision a 32-core VM, and your workload only uses four cores, you are still paying for all 32 cores, regardless of the low utilization rates. The same goes for memory and storage. A large memory footprint and lots of high-speed attached managed disks all add up.

Read on for some advice around figuring out the right server size for your workload.

Comments closed

Explaining FIRST_VALUE() and LAST_VALUE() Execution Plans

Published 2024-01-22 by Kevin Feasel

Hugo Kornelis wraps up a mini-series on window functions:

In part twenty-eight of the plansplaining series, I’ll wrap up the six-part mini-series on window functions. After covering the basics, fast-track optimization, window frames ending at UNBOUNDED FOLLOWING, window frames specified with RANGE instead of ROWS, and LAG and LEAD, we will look at the LAST_VALUE and FIRST_VALUE analytical functions, and find that a function we would have expected to be available as an internal aggregate function does not exist at all! We’ll also find out how SQL Server works around that.

Click through to unravel that mystery.

Comments closed

New Limits for Maximum Connections per Data Source in Power BI

Published 2024-01-22 by Kevin Feasel

Chris Webb notes a change:

One of the most important properties you can set in a Power BI DirectQuery semantic model is the “Maximum connections per data source” property, which controls the number of connections that can be used to run queries against a data source. The good news is that the maximum value that you can set this property to has just been increased in Premium.

Read on to learn why this setting is important.

Comments closed

Fixing Non-SARGable Predicates

Published 2024-01-22 by Kevin Feasel

Erik Darling makes the donuts:

Okay, deep breath. Deep, deep breath.

Would you believe I still have to fix queries with a bunch of functions in joins and where clauses? Like every day? All day?

Where things get tough is when there’s some third party vendor app where code can’t be changed, but customers have become so fed up with performance that they’re willing to make other changes to help things along.

Read on for an example of one scenario in which reversing an index can help improve performance without touching the code.

Comments closed

Data Reading and Writing with arrow

Published 2024-01-19 by Kevin Feasel

Colin Gillespie performs two of the three R’s:

Apache Arrow is a cross-language development platform for in-memory data. As it’s in-memory (as opposed to data stored on disk), it provides additional speed boosts. It’s designed for efficient analytic operations, and uses a standardised language-independent columnar memory format for flat and hierarchical data. The {arrow} R package provides an interface to the ‘Arrow C++’ library – an efficient package for analytic operations on modern hardware.

There are many great tutorials on using {arrow} (see the links at the bottom of the post for example). The purpose of this blog post isn’t to simply reproduce a few examples, but to understand some of what’s happening behind the scenes. In this particular post, we’re interested in understanding the reading/writing aspects of {arrow}.

Read on to see it in action in R.

Comments closed

tidyAML Updates

Published 2024-01-19 by Kevin Feasel

Steven Sanderson has been busy. First up, a post on tidyAML updates:

One of the standout features in this release is the addition of extract_regression_residuals(). This function empowers users to delve deeper into regression models, providing a valuable tool for analyzing and understanding residuals. Whether you’re fine-tuning your models or gaining insights into data patterns, this enhancement adds a crucial layer to your analytical arsenal.

Then, Steven goes into detail on .drap_na:

In the newest release of tidyAML there has been an addition of a new parameter to the functions fast_classification() and fast_regression(). The parameter is .drop_na and it is a logical value that defaults to TRUE. This parameter is used to determine if the function should drop rows with missing values from the output if a model cannot be built for some reason. Let’s take a look at the function and it’s arguments.

After that, we get to see an updated function:

In response to user feedback, we’ve enhanced the internal_make_wflw_predictions() function to provide a comprehensive set of predictions. Now, when you make a call to this function, it includes:

The Actual Data: This is the real-world data that your model aims to predict. Having access to this information helps you assess how well your model is performing on unseen instances.

Training Predictions: Predictions made on the training dataset. This is essential for understanding how well your model generalizes to the data it was trained on.

Testing Predictions: Predictions made on the testing dataset. This is crucial for evaluating the model’s performance on data it hasn’t seen during the training phase.

You can also check out the package’s GitHub repository and see more.

Comments closed

Globs of Tabs in SSMS

Published 2024-01-19 by Kevin Feasel

Warwick Rudd has cramped environs:

Working in SQL Server Management Studio is potentially an everyday occurrence for you! And having to work with many queries open at the same time is probably the norm. Depending on the size of your screen that you may be working on, you are limited with the amount of screen real estate you can work in.

Personally, I get frustrated with having to continually go to the open query drop down window to see what queries I have open and be able to cycle through them to make my life easier and be more productive.

Warwick shows off one built-in way to solve this problem. When I was a database developer, I would have 40-50 tabs open at a time sometimes. I used Tabs Studio (commercial product but it’s not that expensive if you’re buying for yourself) to manage all of that.

Comments closed

Exporting to CSV in Azure ML Designer

Published 2024-01-19 by Kevin Feasel

Tom LaRock saves a file:

The most popular feature in any application is an easy-to-find button saying “Export to CSV.” If this button is not visibly available, a simple right-click of your mouse should present such an option. You really should not be forced to spend any additional time on this Earth looking for a way to export your data to a CSV file.

Well, in Azure ML Studio, exporting to a CSV file should be simple, but is not, unless you already know what you are doing and where to look. I was reminded of this recently, and decided to write a quick post in case a person new to ML Studio was wondering how to export data to a CSV file.

Click through for one false start and then the correct answer.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Author: Kevin Feasel