Press "Enter" to skip to content

Category: Performance Tuning

Contrasting Power BI Import and DirectQuery Modes

Jason Cockington digs into the two modes:

Over the last few months, I have received a number of very similar questions at either one of our Live Training courses or from a student attending one of our Ask Anything Power BI Q&A sessions.  These questions all revolved around DirectQuery and were often specifically about performance issues or complexity of DAX calculations being written on the Power BI reports.  This prompted me to write this post, as it became abundantly clear that most students were opting for DirectQuery when they should have been selecting Import.  If you are not sure if DirectQuery is right for you, then read on, as we explore further.

Jason includes some Real Talk from Chris Webb, fairly eviscerating DirectQuery mode in the majority of cases.

Comments closed

Keeping VLF Counts Low (Enough)

Eitan Blumin has a script for us:

In this ultimate guide, learn how to fix high VLF counts and shrink transaction logs in SQL Server to improve performance. Discover best practices for managing transaction log sizes and VLFs, and get your hands on a robust script to help you out. Boost your SQL Server’s performance and optimize your database administration with these tips.

Click through for the guide, and for a script Eitan has put together.

Comments closed

Trying out Parameter Sensitive Plan Optimization

Jared Poche gives it a try:

Parameter Sensitive Plan Optimization attempts to solve the parameter sniffing issue by allowing a query to have different plans that are used based on the cardinality of one important parameter. The parent query has a plan that is really just a stub; there can be up to three variant queries that have full execution plans. When the query is executed, the optimizer chooses which variant query and plan to use based on the cardinality of the parameter’s value.

Click through for examples. I share Jared’s thoughts about row counts, though I’m not a huge fan of just using those. The rule of thumb I like to apply (though I certainly didn’t make it up myself!) is that, if you’re retrieving 0.5% or less of a table, a seek is the best option, assuming there is an appropriate index. If you’re retrieving 20% or more of a table, a scan is the best option, regardless of index quality.

Going a little further, somewhere in that 0.5-20% range, there’s an in-between zone where you should be indifferent between seek and scan, as they’ll both perform approximately as well. But if the optimizer chooses “seek” for the in-between zone and you nudge up those returned row counts higher and higher, seek becomes less viable, and there may be a zone somewhere between X% (that mid-point of indifference) and 20% where you haven’t yet crossed the row count threshold for another plan but should switch over to the scan.

Coming up with the right solution to this problem would be pretty hard, and I’m not paid to solve problems. I’m not-paid to come up with problems, however.

Comments closed

Against Keys in Fact Tables

Marc Lelijveld searches for keys under the lamppost:

Another blog post based on recent client experiences. Last week, I visited a client where we had extensive discussions on data model optimization. As you might know, data modeling in Power BI is one of my favorite topics, so I had an excellent day. It’s also not the first time that I blog about anything data modeling and optimization. If you haven’t read it yet, I recommend reading my previous blog on this topic.

This blog will focus on the need of keys in your tables and primarily your fact tables in your data model. I keep running into data models at customers which are flooded with keys in all tables. For each of them you should ask, do I really need this and could I save it in a different data type for further optimization. In this blog, I will further elaborate on keys in your data model, typical use cases and how these cases can be solved in different manners.

Read the whole thing. The really short version is classic Kimball-style advice: keys for dimensions, not for facts. And in Power BI, removing a unique column from a fact table can speed things up by shrinking the compressed fact table size.

Comments closed

Quick Tips for SQL Server Performance Troubleshooting

Matthew McGiffen has a two-parter for us. Part 1 covers useful tools for performance troubleshooters:

When you’ve got the symptoms of a database issue you can run a series of diagnostic queries to try and drill down on the problem and then start figuring out solutions. There are a number of free tools out there though that can speed up that process immensely. In this post we’re going to look at my favourites.

Part 2 takes us through the mechanics of measuring query performance:

Attendees have come up with a range of answers from “With a stopwatch” (which I like a lot) to the slightly more technical “Using Profiler”. I like the first answer just because that’s where we all start. We run something from an application and we literally time how long it takes, or we run something in SSMS and we use the counter near the bottom right of our query window to tell us how long it took. We may even run something 100 times in a loop and capture the overall time so we can take an average.

Incidentally, if you are using SET STATISTICS IO ON and are sick of the way it writes out results, Richie Rump’s Statistics Parser tool is great for converting the blob of text into something humans can easily parse.

Comments closed

Instant File Initialization for Log Files in SQL Server 2022

Aaron Bertrand tries something new:

When I first saw a bullet item stating SQL Server 2022 would support instant file initialization for log file growth, I was excited. When I later learned it only applies to automatic growths, and only those of 64 MB or less, I was a little less excited. With those limitations, I was skeptical this enhancement could supplant my long-standing practice of using 1 GB autogrowth for log files – at least ever since SSDs and other modern storage became more commonplace.

But after playing with it, I’m a believer.

I happened to be chatting with Thomas Grohser as he presented for our user group last week, and he also mentioned how good this feature is, as well as how VLF rules changed a bit so you don’t get as heavily penalized by growing in 64MB increments. Aaron does some testing to give us an idea of how much of a benefit you might achieve in making this change.

Comments closed

Performance Overhead of TDE

Matthew McGiffen answers an age-old question:

Microsoft states that enabling TDE usually has a performance overhead of 2–4%. That doesn’t sound like very much, and personally I wouldn’t let it bother me if I want to make sure my data is encrypted at rest. However, you may have heard other sources saying that it’s actually a lot more than that – and the performance impact is a high price to pay for the level of protection offered. So, what’s the truth?

It turns out the answer is a bit more complex than simply saying “x%,” though as a first approximation, I’d still say that the 2-4% is a good starting point. For what would move you off of that 2-4%, read the whole thing.

Comments closed

Optimizing Text Search in DAX

Marco Russo and Alberto Ferrari prime the pump:

When you import a table in Power BI, all the strings contained in a text column are stored in a dictionary, which improves the compression and provides excellent query performance when there is a filter with an exact match for the column value. However, reports that apply complex filters on a text column may have performance issues when the dictionary has a large number of values: depending on many other variables, a column with a few thousand unique values might already present a bottleneck, and this is definitely an issue when there are hundreds of thousands of unique strings in a column.

In October 2022, there was an internal optimization in Power BI that has improved the performance of these searches by creating an internal index. Chris Webb described this optimization in his article, Text search performance in Power BI. In this article, we explore how to evaluate whether the optimization is applied and how to measure any performance improvements. As usual, everything comes at a price: creating the index has a cost, that you will see applied to the first query hitting the column. We will also see how to detect this event and the existing limitations for this optimization.

Click through for their deep dive into the process. The final answer reminds me of the warehousing world, where you might pre-run some important queries to get those pages into the buffer pool and available for later reports.

Comments closed

Speeding Up a Slow Kafka Consumer with Parallelism

Paul Brebner continues a series on Kafka consumers:

In Part 1 of this series, we had a look at Kafka concurrency and throughput work, recapped some earlier approaches I used to improve Kafka performance, and introduced the Kafka Parallel Consumer and supported ordering options (Partition, Key, and Unordered). In this second part we continue our investigations with some example code, a trace of a “slow consumer” example, how to achieve 1 million TPS in theory, some experimental results, what else do we know about the Kafka Parallel Consumer, and finally, if you should use it in production. 

Read on to see what Paul has to say about the topic.

Comments closed

Trying Query Parameterization Settings in SQL Server

Tibor Karaszi builds a test:

You have probably seen the recommendation to turn on the “optimize for ad-hoc workloads” setting. You might even have seen a more recent recommendation to set the database setting parameterization to forced (instead of the default which is simple). The aim of this post is to briefly describe each and then do some test with various settings.

Click through for that test. This is a good example of how we need to temper guidance with context. In Tibor’s scenario, forced parameterization is a no-brainer and optimize for ad hoc workloads gives a pretty nice reduction in plan cache utilization. But then, with optimize for ad hoc workloads on, you lose the ability to see the first run of a query in Query Store and lose the opportunity to tune the different variations of a query which only ran once. Pretty much every setting in SQL Server exists because there is a scenario in which that is the most appropriate setting. Except auto-shrink. Auto-shrink delenda est.

Comments closed