Press "Enter" to skip to content

Curated SQL Posts

Using DBCC OPTIMIZER_WHATIF To Mimic Production Hardware

Max Vernon has a technique for mimicking production hardware layouts when testing queries in development:

Attempting to debug production performance problems in your development environment can be problematic in many ways, leading to a frustrating troubleshooting experience. One very common situation is the resources on the development environment are substantially less robust than on the production system; for instance prod has 128 GB of RAM, while dev only has 16 GB, prod has 16 cores, while dev only has 4 cores. Unintuitively, this disparity can result in queries running faster in development than in production.

SQL Server has a little-known (and undocumented and unsupported) troubleshooting-related DBCC command that can be used to mimic production resource levels in your development environment. As with all undocumented features, do not try this in production.

Read on to learn how DBCC OPTIMIZER_WHATIF can lead the optimizer to choose different plans.  I almost never use this command, but it is helpful to have it in your back pocket.

Comments closed

Modifying Data In Temporal Tables

Jeanne Combrinck shows us how we can insert, update, and delete data in temporal tables:

You delete data in the current table with a regular DELETE statement. The end period column for deleted rows will be populated with the begin time of underlying transaction.
You cannot directly delete rows from history table while SYSTEM_VERSIONING = ON.

Set SYSTEM_VERSIONING = OFF and delete rows from current and history table but keep in mind that way system will not preserve history of changes. TRUNCATE, SWITCH PARTITION OUT of current table and SWITCH PARTITION IN history table are not supported while SYSTEM_VERSIONING = ON.

Data modification is reasonably straightforward with temporal tables.  Read on for examples.

Comments closed

TANSTAAQRC (Query Result Cache)

Andy Mallon explains that a query result cache does not exist in SQL Server:

I was recently doing a training session when a developer commented that it was OK to run an expensive query twice because on the second execution, SQL Server would use the “results cache” and be “practically free”. It’s not the first time I’ve heard someone refer to a “results cache” in SQL Server. This is one of those myths that is almost true, which makes it that much more believable. If you don’t know better, you might think SQL Server has a “results cache” because the second execution of a query is often faster.

SQL Server does not have a “results cache” and the second execution is not “practically free.”
SQL Server does have a “buffer cache” and the second execution is “faster, but not free.”

The SQL Server buffer cache holds data pages in memory, in the exact form that they reside on disk. The second execution will not have to perform physical I/O operations to satisfy the query, because it can use the buffer cache. However, it does have to perform all other operations. Think of it like this: the second execution still executes the entire execution plan, including all the expensive operations. It is faster, but not “practically free.”

Read the comments for Erik Darling’s plot twist.

Comments closed

Tracking Latency To Azure With PsPing

Arun Sirpal shows us how to use PsPing (part of the Sysinternals tool set) to determine latency between your computer and a VM in an Azure data center:

This is the tool of choice when wanting to find out latency to your Azure SQL Server. In addition to standard ICMP ping functionality, it can report the latency of connecting to TCP ports, the latency of TCP round-trip communication.

I use this to find the latency from my location to various Azure SQL Servers which are in different Azure regions. I am based in the heart of England so let’s look and compare a couple of locations (just out of curiosity). Once you have downloaded the tool you will need to CD to the directory and call the following command.

Read on to see how to use PsPing.

Comments closed

Real-Time Data Visualization With R And SQL Server

Tomaz Kastrun shows how simple it can be to plot real(ish)-time data from SQL Server using R:

In the previous post, I have showed how to visualize near real-time data using Python and Dash module.  And it is time to see one of the many ways, how to do it in R. This time, I will not use any additional frames for visualization, like shiny, plotly or any others others, but will simply use base R functions and RODBC package to extract data from SQL Server.

Extracting data from SQL Server will and simulating inserts in SQL Server table will primarily simulate the near real-time data. If you have followed the previous post, you will notice that I am using same T-SQL table and query to extract real-time data.

Tomaz is using the base plot library, but if you want something nicer, there are several good alternatives.

Comments closed

Using word2vec To Model User Behavior

Nishan Subedi walks us through an Etsy project to model user journeys via semantic embedding techniques:

We initially started training the embeddings as a Skip-gram model with negative sampling (NEG as outlined in the original word2vec paper) method. The Skip-gram model performs better than the Continuous Bag Of Words (CBOW) model for larger vocabularies. It models the context given a target token and attempts to maximize the average likelihood of seeing any of the context tokens given a target token. The negative sampling draws a negative token from the entire corpus with a frequency that is directly proportional to the frequency of the token appearing in the corpus.

Training a Skip-gram model on only randomly selected negatives, however, ignores implicit contextual signals that we have found to be indicative of user preference in other contexts. For example, if a user clicks on the second item for a search query, the user most likely saw, but did not like, the first item that showed up in the search results. We extend the Skip-gram loss function by appending these implicit negative signals to the Skip-gram loss directly.

Similarly, we consider the purchased item in a particular session to be a global contextual token that applies to the entire sequence of user interactions. The intuition behind this is that there are many touch points on the user’s journey that help them come to the final purchase decision, and so we want to share the purchase intent across all the different actions that they took. This is also referred to as the linear multi-touch attribution model.

This is a very interesting article, and their attempt at getting around the problem of unexpected explosive growth in demand.

Comments closed

When Wait Stats Aren’t Enough

Joe Obbish has an example of diagnosing performance problems when wait stats don’t indicate any problems:

In summary, page allocations and page free events rapidly occur, sometimes in an alternating pattern. SQL Server will often free a number of pages just to immediately request allocations for a similar number of pages. If all of the free page events result in returned memory to the OS then the reason for the scalability bottleneck becomes clear. When running the full workaround with 96 concurrent sessions, a total of 341965 page freed operations were performed. Those events freed about 71.3 million pages in total. That amounts to about 584 GB of memory returned to the OS in total, based on the previous assumptions.

This is a great investigation into the depths of debugging in SQL Server.  Joe wasn’t able to get a definitive solution to his problem, but he showed us a lot along the way.

Comments closed

Recommendations For Storage On Azure SQL DB Managed Instances

Dimitri Furman has some thoughts on database storage architecture for Azure SQL Database Managed Instances:

MI GP uses Azure Premium Storage to store database files for all databases, except for the tempdb database. From the perspective of the database engine, this storage type is remote, i.e. it is accessed over the network, using Azure network infrastructure. To use Azure Premium Storage, MI GP takes advantage of SQL Server native capability to use database files directly in Azure Blob Storage. This means that there is not a disk or a network share that hosts database files; instead, file path is an HTTPS URL, and each database file is a page blob in Azure Blob Storage.

Since Azure Premium Storage is used, its performance characteristics, limits, and scalability goals fully apply to MI GP. The High-performance Premium Storage and managed disks for VMs documentation article includes a section describing Premium Storage disk limits. While the topic is written in the context of VMs and Azure disks, which is the most common usage scenario for Azure Premium Storage, the documented limits are also applicable to blobs. As shown in the limits table in the documentation, the size of the blob determines the maximum IOPS and throughput that can be achieved against the blob. For MI GP, this means that the size of a database file determines the maximum IOPS and throughput that is achievable against the file.

The disk/blob size shown in the limits table is the maximum size for which the corresponding limit applies. For example, a blob that is > 64 GB and <= 128 GB (equivalent to a P10 disk) can achieve up to 500 IOPS and up to 100 MB/second throughput.

Read the whole thing if you’re looking at Managed Instances, but there are some tips for SQL Server in Azure IaaS.

Comments closed

Data Lakes eBook

Melissa Coates has a free eBook available:

I wrote the updated content from a practical point of view, totally hype-free. The table of contents:

  • Modern Data Architecture
  • Business Needs Driving Data Architectures to Evolve and Adapt
  • Principles of a Modern Data Architecture
  • Data Lake + Data Warehouse: Complementary Solutions
  • Tips for Designing a Data Lake
  • Azure Technologies for Implementing a Data Lake
  • Considerations for a Successful Data Lake in the Cloud
  • Getting Started with a Data Lake

To download the ebook, BlueGranite will ask for you to register your information. That’s common for premium content like this. We take a low-key approach to sales, so I can assure you that registration only means you’ll receive notifications of new content that you may find interesting.

It’s the length of a good-sized paper, so you won’t have to invest dozens of hours of time to get the story.

Comments closed

Don’t Set Max Size For Containers In In-Memory OLTP

Ned Otter recommends you not mess with the maximum container size when creating a memory-optimized filegroup:

I recently saw a thread on twitter, where the OP talked about setting the max size for an In-Memory OLTP container. I responded as I always do: it’s not possible to set a limit on anything having to do with storage for In-Memory OLTP.

Unfortunately, that’s not correct: through SSMS or TSQL, you can in fact set a max size for a container.

But you should not ever do that…..

Why?

Because if you do, and your checkpoint files exceed the max size of the container, your database can go into the In Recovery, Suspect, or OFFLINE state.

Read on for a repro that you should not try in production.  Or anywhere, really.

Comments closed