Press "Enter" to skip to content

Curated SQL Posts

Explaining FIRST_VALUE() and LAST_VALUE() Execution Plans

Hugo Kornelis wraps up a mini-series on window functions:

In part twenty-eight of the plansplaining series, I’ll wrap up the six-part mini-series on window functions. After covering the basicsfast-track optimizationwindow frames ending at UNBOUNDED FOLLOWINGwindow frames specified with RANGE instead of ROWS, and LAG and LEAD, we will look at the LAST_VALUE and FIRST_VALUE analytical functions, and find that a function we would have expected to be available as an internal aggregate function does not exist at all! We’ll also find out how SQL Server works around that.

Click through to unravel that mystery.

Comments closed

Tips for Saving Money in the Cloud

David Klee offers up some advice:

You might be able to shave off thousands – or more – in your monthly cloud bills for your critical SQL Servers, all while maintaining or even improving performance.

Public cloud providers charge organizations for everything they deploy, and while a few items in the cloud are based on a pay-by-consumption model, most of the services that really stack up, namely SQL Server licensing, are charged on a pay-by-allocation model. If you provision a SQL Server VM with eight cores and promptly forget about it, you will be shocked at the end of the month when you receive your cloud subscription bill. If you provision a 32-core VM, and your workload only uses four cores, you are still paying for all 32 cores, regardless of the low utilization rates. The same goes for memory and storage. A large memory footprint and lots of high-speed attached managed disks all add up.

Read on for some advice around figuring out the right server size for your workload.

Comments closed

Fixing Non-SARGable Predicates

Erik Darling makes the donuts:

Okay, deep breath. Deep, deep breath.

Would you believe I still have to fix queries with a bunch of functions in joins and where clauses? Like every day? All day?

Where things get tough is when there’s some third party vendor app where code can’t be changed, but customers have become so fed up with performance that they’re willing to make other changes to help things along.

Read on for an example of one scenario in which reversing an index can help improve performance without touching the code.

Comments closed

New Limits for Maximum Connections per Data Source in Power BI

Chris Webb notes a change:

One of the most important properties you can set in a Power BI DirectQuery semantic model is the “Maximum connections per data source” property, which controls the number of connections that can be used to run queries against a data source. The good news is that the maximum value that you can set this property to has just been increased in Premium.

Read on to learn why this setting is important.

Comments closed

tidyAML Updates

Steven Sanderson has been busy. First up, a post on tidyAML updates:

One of the standout features in this release is the addition of extract_regression_residuals(). This function empowers users to delve deeper into regression models, providing a valuable tool for analyzing and understanding residuals. Whether you’re fine-tuning your models or gaining insights into data patterns, this enhancement adds a crucial layer to your analytical arsenal.

Then, Steven goes into detail on .drap_na:

In the newest release of tidyAML there has been an addition of a new parameter to the functions fast_classification() and fast_regression(). The parameter is .drop_na and it is a logical value that defaults to TRUE. This parameter is used to determine if the function should drop rows with missing values from the output if a model cannot be built for some reason. Let’s take a look at the function and it’s arguments.

After that, we get to see an updated function:

In response to user feedback, we’ve enhanced the internal_make_wflw_predictions() function to provide a comprehensive set of predictions. Now, when you make a call to this function, it includes:

  1. The Actual Data: This is the real-world data that your model aims to predict. Having access to this information helps you assess how well your model is performing on unseen instances.
  2. Training Predictions: Predictions made on the training dataset. This is essential for understanding how well your model generalizes to the data it was trained on.
  3. Testing Predictions: Predictions made on the testing dataset. This is crucial for evaluating the model’s performance on data it hasn’t seen during the training phase.

You can also check out the package’s GitHub repository and see more.

Comments closed

Data Reading and Writing with arrow

Colin Gillespie performs two of the three R’s:

Apache Arrow is a cross-language development platform for in-memory data. As it’s in-memory (as opposed to data stored on disk), it provides additional speed boosts. It’s designed for efficient analytic operations, and uses a standardised language-independent columnar memory format for flat and hierarchical data. The {arrow} R package provides an interface to the ‘Arrow C++’ library – an efficient package for analytic operations on modern hardware.

There are many great tutorials on using {arrow} (see the links at the bottom of the post for example). The purpose of this blog post isn’t to simply reproduce a few examples, but to understand some of what’s happening behind the scenes. In this particular post, we’re interested in understanding the reading/writing aspects of {arrow}.

Read on to see it in action in R.

Comments closed

Exporting to CSV in Azure ML Designer

Tom LaRock saves a file:

The most popular feature in any application is an easy-to-find button saying “Export to CSV.” If this button is not visibly available, a simple right-click of your mouse should present such an option. You really should not be forced to spend any additional time on this Earth looking for a way to export your data to a CSV file.

Well, in Azure ML Studio, exporting to a CSV file should be simple, but is not, unless you already know what you are doing and where to look. I was reminded of this recently, and decided to write a quick post in case a person new to ML Studio was wondering how to export data to a CSV file.

Click through for one false start and then the correct answer.

Comments closed

Globs of Tabs in SSMS

Warwick Rudd has cramped environs:

Working in SQL Server Management Studio is potentially an everyday occurrence for you! And having to work with many queries open at the same time is probably the norm.  Depending on the size of your screen that you may be working on, you are limited with the amount of screen real estate you can work in.

Personally, I get frustrated with having to continually go to the open query drop down window to see what queries I have open and be able to cycle through them to make my life easier and be more productive.

Warwick shows off one built-in way to solve this problem. When I was a database developer, I would have 40-50 tabs open at a time sometimes. I used Tabs Studio (commercial product but it’s not that expensive if you’re buying for yourself) to manage all of that.

Comments closed

Storing SQL Server Backups in Cloudflare R2

Daniel Hutmacher saves a buck:

R2 is Cloudflare’s own implementation of AWS S3 storage, with some big benefits – one of them being no egress fees, which is great if you want to publish or distribute a lot of data (like I did with this demo database). In this post, I thought I’d briefly document how to set up R2, and how to use it to back up and restore your SQL Server databases.

You’ll need a Cloudflare account to follow along. The account and a lot of their services are free, but R2 storage obviously comes with a small cost. For scale, I’m running an almost-terabyte bucket at just a couple of dollars per month.

Given the number of times I’ve pushed Daniel’s excellent Chicago parking tickets database (including right now—it’s a great database that I’ve used in several presentations and videos!), the lack of egress charges is pretty big.

Comments closed