Press "Enter" to skip to content

Curated SQL Posts

Dynamic Top N in Power BI

Gerhard Brueckl shows how to create a Top N slicer in Power BI, as well as some of the problems you might need to work through:

As I said, this pretty much depends on the business requirements and after discussing that in length with the users, the solution is usually to simply add an “Others” row that sums up all values which are not part of the TopN items. For regular business users this requirement sounds really trivial because in Excel the could just add a new row and subtract the values of the TopN items from the Grand Total.

However, they usually will not understand the complexity behind this requirement for Power BI. In Power BI we cannot simply add a new “Others” row on the fly. It has to be part of the data model and as the TopN calculations is already dynamic, also the calculation for “Others” has to be dynamic. As you probably expected, also this has been covered already:
Oraylis – Show TopN and rest in Power BI
Power BI community – Dynamic Top N and Others category

This is a pain point that ideally I’d like the Power BI team to address. Gerhard does a good job showing how to do it, but Tableau has that (and more) built in.

Comments closed

Memory-Optimized TempDB Metadata Tables

Ned Otter shows how to configure SQL Server to use memory-optimized metadata tables in TempDB:

Like other changes to TempDB, in order for the new memory-optimization to take effect a restart of the SQL Server service is required. Once the service is restarted, system tables in TempDB are now memory-optimized (it should be that way for RTM, but in CTP 3.0, it could be the case that not all system tables have been converted to Hekaton). You can reverse this setting with the following command, and again restarting the SQL Server service:

There are a couple of important notes that Ned gives us around accessing these metadata tables in scripts.

Comments closed

The Cost of Ad Hoc Queries

Erin Stellato gives us a simple demonstration of why parameterization is important for performance:

From this screenshot you can see that we have about 3GB total dedicated to the plan cache, and of that 1.7GB is for the plans of over 158,000 adhoc queries. Of that 1.7GB, approximately 500MB is used for 125,000 plans that execute ONE time only. About 1GB of the plan cache is for prepared and procedure plans, and they only take up about 300MB worth of space. But note the average use count – well over 1 million for procedures. In looking at this output, I would categorize this workload as mixed – some parameterized queries, some adhoc.

Kimberly’s blog post discusses options for managing a plan cache filled with a lot of adhoc queries. Plan cache bloat is just one problem you have to contend with when you have an adhoc workload, and in this post I want to explore the effect it can have on CPU as a result of all the compilations that have to occur. When a query executes in SQL Server, it goes through compilation and optimization, and there is overhead associated with this process, which frequently manifests as CPU cost. Once a query plan is in cache, it can be re-used. Queries that are parameterized can end up re-using a plan that’s already in cache, because the query text is exactly the same. When an adhoc query executes it will only re-use the plan in cache if it has the exact same text and input value(s).

Read on to see an example of how long it takes a set of ad hoc queries to finish versus their parameterized equivalents. Erin’s test is at the behavioral extreme (100% parameterized versus 100% ad hoc) so real-world results won’t be quite this good.

Comments closed

Timing R Function Calls

Colin Gillespie shows off an R package for benchmarking:

Of course, it’s more likely that you’ll want to compare more than two things. You can compare as many function calls as you want with mark(), as we’ll demonstrate in the following example. It’s probably more likely that you’ll want to compare these function calls against more than one value. For example, in the digest package there are eight different algorithms. Ranging from the standard md5 to the newer xxhash64 methods. To compare times, we’ll generate n = 20 random character strings of length N = 10,000. This can all be wrapped up in the single function press() function call from the bench package:

Click through for an example involving hashing algorithms.

Comments closed

Linear Programming in Python

Francisco Alvarez shows us an example of linear programming in Python:

The first two constraints, x1 ≥ 0 and x2 ≥ 0 are called nonnegativity constraints. The other constraints are then called the main constraints. The function to be maximized (or minimized) is called the objective function. Here, the objective function is x1 + x2.

Two classes of problems, called here the standard maximum problem and the standard minimum problem, play a special role. In these problems, all variables are constrained to be nonnegative, and all main constraints are inequalities.

That post spurred me on to look up LINGO and see that it’s actually still around.

Comments closed

From pandas to Spark with koalas

Achilleus tries out Koalas:

Python is widely used programming language when it comes to Data science workloads and Python has way too many different libraries to back this fact. Most of the data scientists are familiar with Python and pandas mostly. But the main issue with Pandas is it works great for small and medium datasets but not so good on big-data workloads. The challenge now becomes to convert the existing pandas code to pyspark code. This is just not straight forward and has a lot of performance hits if python UDFs are used without much care.

Koalas tries to address the first problem ie lessen the friction of learning different APIs to port their existing Pandas code to Pyspark. With Koalas, we can just directly replace the existing pandas code with Koalas. As far as the performance goes, there are no numbers yet as it is still in the initial phase of the project. But this definitely looks promising though.

Read on for some initial thoughts on the product, including a few gotchas.

Comments closed

Storing Large Images in Power BI

Chris Webb shows us how to store a large image in Power BI:

Jason Thomas and Gerhard Brueckl have both blogged on the subject of storing images as text inside a Power BI dataset:
http://sqljason.com/2018/01/embedding-images-in-power-bi-using-base64.html
https://blog.gbrueckl.at/2018/01/storing-images-powerbi-analysis-services-data-models/

Since they wrote those posts, however, Power BI has added the ability to set the Data Category property on measures as well as columns in tables. This means it is now possible to have the output of a DAX measure displayed as an image in a Power BI report and this in turn opens up a lot of new possibilities – including the ability to work around the maximum size of a text value that can be loaded into Power BI (see my previous blog post for more details) and therefore work with larger images.

I don’t understand why they make this so complicated. I have a Grafana dashboard widget that I show in a Power BI dashboard and have it scaled way down so it fits in under 32K. I appreciate Chris’s answer but that’s a lot of work to show an image.

Comments closed

Creating Containers and Volumes

Grant Fritchey continues a dive into containers. First up is running a Docker container:

Let’s break this down a bit so you know what you just did. The two ‘-e’ statements are setting environment variables. The first is accepting Microsoft’s end user license agreement, EULA. The second is setting the SA password. By default, we’re running a Developer Edition of SQL Server here. If you want to, you can change to a version that you have a specific license for using the MSSQL_PID environment variable. Documentation for that is located here.

Next is using volumes:

Now, let’s create a new container, but, let’s use the same volume:
docker run -e 'ACCEPT_EULA=Y' ` -e 'SA_PASSWORD=$cthulhu1988' ` -p 1450:1433 ` --name DockerDemo19 ` -v sqlvol:/var/opt/mssql ` -d mcr.microsoft.com/mssql/server:2019-CTP2.5-ubuntu

What happens next is marvelous. 

It’s an exciting time to get into containers and if you’re feeling a little trepidatious, they’re containers—the worst thing you can do is mess one up and then you just blow it away and start over.

Comments closed

Debugging DAX Calculations

Imke Feldmann has a debugger measure for DAX:

This is a measure that returns a text-value, showing the number of rows of the adjusted filter context table, the MIN and MAX value of the selected column as well as up to the first 10 values. Just place this measure beneath the CALCULATE-measure in question and try to find the error 

Just have in mind, that this only works for standalone CALCULATE-functions and not for those who are nested in other functions (who modify the evaluation context).

This looks to be quite useful.

Comments closed