Press "Enter" to skip to content

Curated SQL Posts

Why .NET And Java Have StringBuilders

Randolph West walks us through a performance troubleshooting issue with a twist:

So we branch the the code in source control, and start writing a helper class to manage the data for us closer to the application. We throw in a SqlDataAdapter, use the Fill() method to bring back all the rows from the query in one go, and then use a caching layer to keep it in memory in case we need it again. SQL Server’s part in this story has now faded into the background. This narrow table consumes a tiny 8 MB of RAM, and having two or more copies in memory isn’t the end of the world for our testing. So far, so good again.

We run the new code, first stepping through to make sure that it still does what it used to, massaging here and there so that in the end, a grid is populated on the application with the results of the query. Success! We then compile it in Release mode, and run it without any breakpoints to do some performance testing.

And then we find that it runs at exactly the same speed to produce exactly the same report, using our caching and SqlDataAdapter, and we’ve wasted another hour of our time waiting for the grid and report. Where did we go wrong?

As people get better at tuning, we start to make assumptions based on prior experience.  That, on net, is a good thing, but as Randolph shows, those assumptions can still be wrong.

Comments closed

The Problems With NOLOCK

Rob Farley demonstrates the downside of the READ UNCOMMITTED isolation level:

I’m going to create a table and insert exactly 1 million rows. This particular table will be a clustered index, and will contain 1 million GUIDs.

Next I prove that there a million rows.

Now without inserting or deleting any rows, I’m going to shuffle them.

And if while this is happening, I count the rows in a different session, I have to wait for that query to finish.

Read on to see what happens when someone gets the idea of running the select query with NOLOCK.

Comments closed

Get-DbaDbCompression DBATools Cmdlet

Jess Pomfret walks us through her recent contribution to dbatools:

I’ve been using this at work recently and it also relates to the presentation I gave at the ONSSUG June meeting around data compression. The beginnings of this script originated online as I dug into learning about the DMVs that related to objects and compression and then customized for what I needed.

If you run the below as is it will provide basic information about all objects in your database, except those in the ‘sys’ schema, along with their current size and compression level.

Click through for the script or, if your version of dbatools is up to date, call Get-DbaDbCompression.

Comments closed

Building A Calendar Table

Louis Davidson has an example of a calendar table in SQL Server:

The solution is part of my calendar/date dimension code, and it is used to do relative positioning over date periods. For example, say you have the need to get data from the 10 days. You can definitely use a simple between to filter the rows, and a bunch of date functions to group by year, month, etc., generally all of the “normal” groupings. But using a calendar table allows you to prebuild a set of date calculations that make the standard values easier to get, and non-standard groupings possible. The technique I will cover makes moving around in the groupings more easily accessible. Like if you want data from the last 3 complete months. The query to do this isn’t rocket science, but it isn’t exactly straightforward either.

For the example, I will use the calendar table that I have on my website here: http://drsql.org/code in the download SimpleDateDimensionCreateAndLoad, and will load it with data up until 2020. Here is that structure:

Read on for examples of usage.  This is an example where thinking relationally differs from thinking procedurally—imagining date ranges as pre-calculated sets isn’t intuitive to procedural developers, but it can give a big performance boost.

Comments closed

Parallelism Strategies For Grouping Operations

Itzik Ben-Gan continues his series on grouping data in SQL Server by looking at how these operations can go parallel:

Besides needing to choose between various grouping and aggregation strategies (preordered Stream Aggregate, Sort + Stream Aggregate, Hash Aggregate), SQL Server also needs to choose whether to go with a serial or a parallel plan. In fact, it can choose between multiple different parallelism strategies. SQL Server uses costing logic that results in optimization thresholds that under different conditions make one strategy preferred to the others. We’ve already discussed in depth the costing logic that SQL Server uses in serial plans in the previous parts of the series. In this section I’ll introduce a number of parallelism strategies that SQL Server can use for handling grouping and aggregation. Initially, I won’t get into the details of the costing logic, rather just describe the available options. Later in the article I’ll explain how the costing formulas work, and an important factor in those formulas called DOP for costing.

As you will later learn, SQL Server takes into account the number of logical CPUs in the machine in its costing formulas for parallel plans. In my examples, unless I say otherwise, I assume the target system has 8 logical CPUs. If you want to try out the examples that I’ll provide, in order to get the same plans and costing values like I do, you need to run the code on a machine with 8 logical CPUs as well. If you’re machine happens to have a different number of CPUs, you can emulate a machine with 8 CPUs—for costing purposes—like so:

DBCC OPTIMIZER_WHATIF(CPUs, 8);

Even though this tool is not officially documented and supported, it’s quite convenient for research and learning purposes.

This is a fairly long article, but a great one.

Comments closed

Installing Confluent Platform On Windows

Niels Berglund shows how to install Confluent Platform (the Confluent branded version of Apache Kafka) on a Windows machine using the Windows Subsystem for Linux:

WSL is primarily aimed at developers, and it allows you to run Linux environments directly on Windows in a native format and without the overhead of a virtual machine. Let us retake a look at that statement: run Linux environments directly on Windows in a native format. Yes native format, WSL is not a UNIX-like environment like Cygwin, which wraps non-Windows functionality in Win32 system calls but it serves Linux programs as special, isolated minimal processes (pico-processes) attached to kernel-mode pico-providers. If you want to read all the “gory” details about WSLWindows Subsystem for Linux Overview gives you an excellent introduction.

Surprisingly, it’s pretty easy—I would have expected some strange compatibility issues.

Comments closed

Building Cone Plots In Plotly

The Plotly blog shows how to use Python to build 3D cone plots using Plotly:

This plot uses an explicitly defined vector field. A vector field refers to an assignment of a vector to each point in a subset of space.

In this plot, we visualize a collection of arrows that simply model the wind speed and direction at various levels of the atmosphere.

3-D weather plots can be useful to research scientists to gain a better understanding of the atmospheric profile, such as during the prediction of severe weather events like tornadoes and hurricanes.

Sometimes a 3D plot is the best answer.  When it is, this looks like a good solution.  H/T R-bloggers

Comments closed

Using APPLY To Aggregate Unpivoted Data

Dan Clemens gives us yet another practical use of the APPLY operator:

I had a situation last week where I needed to find the MIN() and MAX() values of some data. Normally this would be pretty straightforward, but the problem was I needed to identify the MIN() and MAX() values amongst multiple columns. So today I bring you a SQL tip using APPLY.

Looking at sys.procedures we can see I have a create_date and a modify_date column for each record. What if we needed to return the MIN() or MAX() value from those 2 (or more) columns?

SELECT [Name]
,create_date
,modify_date
FROM sys.procedures AS p
WHERE p.[name] = 'ChangePassword';

In this two-column example, it’s not too difficult.  As you add more and more columns, the solution remains the same, though the urge to ask why all of these dates are unpivoted might increase…

Comments closed

Sorting When Your Measure Is Not In The Visual

Kasper de Jonge shows us different ways of sorting a visual by some unrelated measure:

So lets start with the simple one, I want to sort a chart on a measure not part of the visual. Let’s take this visual:

Now instead of sorting by OrderQuantity I want to sort by the ListPrice. The trick here is to make the measure part of the query, and one way you can do that is by adding it to the tooltip

Read on for examples for charts as well as matrices.

Comments closed

Managing Central Management Server

Chrissy LeMaire shows how you can use dbatools to manage Central Management Server and registered servers:

It’s a super useful feature that not all DBAs know about. Since CMS data is stored in msdb and accessible via SMO, you can access it from SQL Server Management Studio or PowerShell modules like dbatools.

Central Management Server’s essential functionality includes:

Actions that are taken by using a central management server group act on all servers in the server group. This includes connecting to servers by using Object Explorer and executing Transact-SQL statements and Policy-Based Management policies on multiple servers at the same time.

I mostly use it as a visual repository of my SQL Servers. Prior to using dbatools and Invoke-DbaSqlQuery, however, I did use CMS to easily execute code against a number of different servers.

CMS is a great feature, and is a critical tool for scaling out a SQL Server infrastructure.

Comments closed