Press "Enter" to skip to content

Curated SQL Posts

Creating Temp Staging Tables to Avoid Spooling

Bert Wagner shows how you can create your own tables in tempdb to avoid eager or lazy spools:

SQL Server Spool operators are a mixed bag. On one hand, they can negatively impact performance when writing data to disk in tempdb. On the other hand, they allow filtered and transformed result sets to be temporarily staged, making it easier for that data to be reused again during that query execution.

The problem with the latter scenario is that SQL Server doesn’t always decide to use a spool; often it’s happy to re-read (and re-process) the same data repeatedly. When this happens, one option you have is to explicitly create your own temporary staging table that will help SQL Server cache data it needs to reuse.

The other problem with spooling is that the spool doesn’t have indexes and so performance can be awful. When I look at an execution plan, one of my immediate red flags is spooling: if we have that, removing it is one of the first candidates for optimization after the trivial stuff (expected scan/seek behavior, “fat pipes” from excessive row counts, residual I/O, etc.).

Comments closed

Splitting Arrays with OPENJSON

Dave Mason continues a journey into parsing JSON with T-SQL:

Starting with SQL Server 2016, Microsoft provided a STRING_SPLIT function. It is a table-valued function that splits a string into rows of substrings, based on a specified separator character. It’s been a welcome addition that we waited a long time for. It has one shortcoming, though: the order of the output rows is not guaranteed to match the order of the substrings in the input string.

Microsoft also provided support for parsing JSON data starting with SQL Server 2016. I discovered the OPENJSON function can be used to split strings, and it can also return the ordinal position of each substring from the original input string.

There are some limitations which you’d expect, namely around requirements for valid JSON.

Comments closed

Horizontal Labels with ggplot

Michael Toth shows us how to ensure we use horizontal text labels in ggplot:

There are several things we could do to improve this graph, but in this guide let’s focus on rotating the y-axis label. This simple change will make your graph so much better. That way, people won’t have to tilt their heads like me to understand what’s going on in your graph:

It may not seem like much when you’re creating the visual, but it can make a difference for a viewer.

Comments closed

Repeated Cross-Validation in R

Ludvig Olsen walks us through a couple of nice R packages:

The benefits of using groupdata2 to create the folds are 1) that it allows us to balance the ratios of our output classes (or simply a categorical column, if we are working with linear regression instead of classification), and 2) that it allows us to keep all observations with a specific ID (e.g. participant/user ID) in the same fold to avoid leakage between the folds.

The benefit of cvms is that it trains all the models and outputs a tibble (data frame) with results, predictions, model coefficients, and other sweet stuff, which is easy to add to a report or do further analyses on. It even allows us to cross-validate multiple model formulas at once to quickly compare them and select the best model.

Ludvig also gives us some examples of how both packages can help you out. H/T R-Bloggers

Comments closed

Using Power Query to Expand Out Missing Dates

Matt Allington solves a problem in Power Query:

Suppose you have data in the form of dates (not consecutive) with a value for each of the dates (see the table below left side). You need to expand the rows of the table (create the missing rows) so that you will have all the consecutive dates in the given range and each of the dates has the previous updated value (see the table below right side).

The solution has a pretty large number of steps but is straightforward.

Comments closed

Rewriting Expensive Updates

Erik Darling takes us through an experiment:

Let’s also say that bad query is taking part in a modification.

UPDATE u2
SET u2.Reputation *= 2
FROM Users AS u
JOIN dbo.Users AS u2
ON CHARINDEX(u.DisplayName, u2.DisplayName) > 0
WHERE u2.Reputation >= 100000;
AND u.Id <> u2.Id;

This query will run for so long that we’ll get sick of waiting for it. It’s really holding up writing this blog post.

Erik rewrites this query a couple of times. Click through to learn what he does and why he does it.

Comments closed

Triggers and Multi-Record Changes

Brent Ozar points out a common problem with trigger design:

When you declare variables and set them using one row from the INSERTED or DELETED virtual table, you have no idea which row you’re going to get. Even worse, sometimes this trigger will update one row, and sometimes it won’t – because it might happen to grab a row with a reputation under 1,000!

It’s an easy mistake to make and one which can have a major impact.

Comments closed

Enabling Large Memory Pages in SQL Server

David Klee talks us through large memory pages:

SQL Server Enterprise Edition can leverage large memory pages to reduce the amount of memory pointers required for larger SQL Server deployments. Reducing the number of pointers makes the database engine more efficient, especially for SQL Servers with greater than 32GB of RAM. A normal memory block is 4KB, and many thousands of pointers are required to manage the memory underneath a larger SQL Server. Large memory pages can change the block size to 2MB, greatly reducing the number of pointers required for memory management.

Read on to see what effect this has, as well as when to use them and—more importantly—when not to use them.

Comments closed

Using Powershell Core in Containers

Anthony Nocentino shows us how we can run Powershell Core in containers:

Now, with that last technique, we’ve encapsulated the entire lifecycle of the execution of that script into one line of code. It’s like this script execution never happened…or did it 😉 All kidding aside, we effectively have a serverless computing platform now. Using this technique in our data centers, we can spin up a container, on any version of PowerShell on any platform, run some workload/script and when the workload finishes, the container just goes away. For this to work well, we will need something to drive that process. In an upcoming blog post, we’ll talk more about how we can automate the running of PowerShell containers in Kubernetes.
 
In this post, we covered a lot, we looked at how you can interactively run PowerShell Core in a container, how you can pass cmdlets into a container at runtime, running different versions of PowerShell Core and also how you can persistently store scripts outside of containers in volumes and run those scripts in your containers. We also looked at how you can encapsulate the whole execution of a script and the containers life cycle into one line of code. Really giving you the ability to run PowerShell Core anywhere on any platform.

Check it out for sure. Containers today are where VMs were about a decade ago: becoming more common but still a bit “out there” for administrators. It’s not a stretch to say that within a few years, containers will be as ubiquitous as VMs were by 2012, if not more so.

Comments closed