Press "Enter" to skip to content

Month: May 2019

Constructing Virtual Tables with VALUES

Kenneth Fisher shows how to use the VALUES clause to construct a virtual table:

This has come up a few times recently, I find it rather fascinating and I can never seem to remember how to do it properly .. so in other words, it’s a perfect subject for a blog post.

Basically, you can use VALUES to create a table within a query. I’ve seen it done in a number of places. Mostly when demoing something and you want some data, but don’t want to actually create a table. I’ve also seen it used to create a numbers table, create test data, etc. Really, any case where you want a short list of values but don’t want to create an actual (or even temp) table to store them.

Click through for examples on how to construct and use this virtual table as a quick replacement for creating a temporary table or table variable.

Comments closed

Customizing a Docker Container

Grant Fritchey shows how you can take a Docker container and save modifications:

There are much more sophisticated ways to get this done using Docker Files. However, this illustrates the point quite simply. You can customize your servers and then use those customizations. You don’t have to re-customize every time. Again, this is just a small slice of why containers are so powerful.

This method is great when you want to build out a sample data set, like when you’re running through automated testing and want to start from the same known point each time.

Comments closed

Max Dispatch Latency in Extended Events

Dave Bland takes us through the Max_Dispatch_Latency property for an Extended Event:

You would logically think that the minimum setting would be zero seconds.  If you think that way, you are correct.  However, 0 does not mean 0 seconds.  When this is set to 0 it means the event will stay in the buffer until the buffer becomes full.  This is the same set setting the “Unlimited” option you see below. Given this, the true minimum is one second.

Read on to see what it does and why it can be important.

Comments closed

SQL Server Settings Blade in Azure

Dave Bermingham notes a recent change to the Azure Portal when creating a new VM with SQL Server pre-installed:

As you slide the IOPS slider to the right you will see the number of data disks increase, the Storage Size increase, and the Throughput increase. You will be limited to the max number of IOPS and disks supported by that instance size. You see in the screenshot below I am able to go as high as 80,000 IOPS when provisioning storage for a Standard E64-16s_v3 instance.

It sounds like they did a pretty good job of things there.

Comments closed

Dynamic Top N in Power BI

Gerhard Brueckl shows how to create a Top N slicer in Power BI, as well as some of the problems you might need to work through:

As I said, this pretty much depends on the business requirements and after discussing that in length with the users, the solution is usually to simply add an “Others” row that sums up all values which are not part of the TopN items. For regular business users this requirement sounds really trivial because in Excel the could just add a new row and subtract the values of the TopN items from the Grand Total.

However, they usually will not understand the complexity behind this requirement for Power BI. In Power BI we cannot simply add a new “Others” row on the fly. It has to be part of the data model and as the TopN calculations is already dynamic, also the calculation for “Others” has to be dynamic. As you probably expected, also this has been covered already:
Oraylis – Show TopN and rest in Power BI
Power BI community – Dynamic Top N and Others category

This is a pain point that ideally I’d like the Power BI team to address. Gerhard does a good job showing how to do it, but Tableau has that (and more) built in.

Comments closed

Memory-Optimized TempDB Metadata Tables

Ned Otter shows how to configure SQL Server to use memory-optimized metadata tables in TempDB:

Like other changes to TempDB, in order for the new memory-optimization to take effect a restart of the SQL Server service is required. Once the service is restarted, system tables in TempDB are now memory-optimized (it should be that way for RTM, but in CTP 3.0, it could be the case that not all system tables have been converted to Hekaton). You can reverse this setting with the following command, and again restarting the SQL Server service:

There are a couple of important notes that Ned gives us around accessing these metadata tables in scripts.

Comments closed

The Cost of Ad Hoc Queries

Erin Stellato gives us a simple demonstration of why parameterization is important for performance:

From this screenshot you can see that we have about 3GB total dedicated to the plan cache, and of that 1.7GB is for the plans of over 158,000 adhoc queries. Of that 1.7GB, approximately 500MB is used for 125,000 plans that execute ONE time only. About 1GB of the plan cache is for prepared and procedure plans, and they only take up about 300MB worth of space. But note the average use count – well over 1 million for procedures. In looking at this output, I would categorize this workload as mixed – some parameterized queries, some adhoc.

Kimberly’s blog post discusses options for managing a plan cache filled with a lot of adhoc queries. Plan cache bloat is just one problem you have to contend with when you have an adhoc workload, and in this post I want to explore the effect it can have on CPU as a result of all the compilations that have to occur. When a query executes in SQL Server, it goes through compilation and optimization, and there is overhead associated with this process, which frequently manifests as CPU cost. Once a query plan is in cache, it can be re-used. Queries that are parameterized can end up re-using a plan that’s already in cache, because the query text is exactly the same. When an adhoc query executes it will only re-use the plan in cache if it has the exact same text and input value(s).

Read on to see an example of how long it takes a set of ad hoc queries to finish versus their parameterized equivalents. Erin’s test is at the behavioral extreme (100% parameterized versus 100% ad hoc) so real-world results won’t be quite this good.

Comments closed

Timing R Function Calls

Colin Gillespie shows off an R package for benchmarking:

Of course, it’s more likely that you’ll want to compare more than two things. You can compare as many function calls as you want with mark(), as we’ll demonstrate in the following example. It’s probably more likely that you’ll want to compare these function calls against more than one value. For example, in the digest package there are eight different algorithms. Ranging from the standard md5 to the newer xxhash64 methods. To compare times, we’ll generate n = 20 random character strings of length N = 10,000. This can all be wrapped up in the single function press() function call from the bench package:

Click through for an example involving hashing algorithms.

Comments closed