Press "Enter" to skip to content

Category: Performance Tuning

Solving Last Page Contention Problems

Erik Darling ponders several techniques for solving the last page contention problem in SQL Server:

If one dare go looking, a list of Microsoft Approved™ solutions for dealing with last page contention exists.

I’m going to walk through my experiences with them in different scenarios, so you get a better understanding of which ones work best, and which ones can be tough to implement.

Read the whole thing.

Comments closed

Speeding Up Pivot Operations in Power Query

Imke Feldmann has a few tricks for making pivot operations in Power Query faster:

Pivot operations in are a very handy feature in  Power Query but they can slow down refresh performance. So with some bittersweet pleasure I can tell that I found a trick to speed them up. The sweetness comes from the fact that the performance improvement is very significant. But the bitterness comes from the fact that I could have used this for almost 4 years now, but was too blind to realize at the time when I first worked with the code.

Read the whole thing and learn why Imke is disappointed in not knowing this four years ago.

Comments closed

Dynamic File Pruning on Delta Lake

Ali Afroozeh, et al, take us through Dynamic File Pruning in Databricks Runtime 6.1:

In addition to eliminating data at partition granularity, Delta Lake on Databricks dynamically skips unnecessary files when possible. This can be achieved because Delta Lake automatically collects metadata about data files managed by Delta Lake and so, data can be skipped without data file access. Prior to Dynamic File Pruning, file pruning only took place when queries contained a literal value in the predicate but now this works for both literal filters as well as join filters. This means that Dynamic File Pruning now allows star schema queries to take advantage of data skipping at file granularity.

There are some interesting performance results here. I’d also be curious to see how robust the results are as queries get more complicated

Comments closed

Optimizing Slow Card Visuals in Power BI

Marco Russo helps us tune Power BI reports containing a large number of card visuals:

Every visual element in a Power BI report must complete a number of tasks to provide the expected result. Visuals showing data must generate one or more DAX queries to retrieve the required measures applying the correct filters. The execution of these queries increases the waiting time for the end user, and increase the workload on the server, especially when multiple users access a published report at the same time. In order to improve the performance and the scalability of a report, the best practice is reducing the number of visuals consuming data published in a page of a report.

The focus is on a single page of the report. Power BI only gets data and build the visualizations required for the active page of a report. When the user switches the focus to a different page, the waiting time only depends on the visuals of the new page. The content of other pages of the same report is not relevant for the performance. The goal is reducing the number of visuals in a single page of a report. This could be challenging in order to obtain the same report layout, but we can look for the right visualization once we realize that the number of visuals in the same page is negatively affecting the user experience.

Less is more here.

Comments closed

Foreign Keys and Non-Changing Updates

Brent Ozar has a warning for us:

If you update a row without actually changing its contents, does it still hurt?

Paul White wrote in detail about the impact of non-updating updates, proving that SQL Server works hard to avoid doing extra work where it can. That’s a great post, and you should read it.

But foreign keys add another level of complexity. If you have foreign keys, and you tell SQL Server that you’re updating the contents of those related columns, SQL Server will check ’em even when the data isn’t changing.

Click through for the demonstration. I don’t think I agree with Brent’s dichotomy as laid out at the end of the post—the back-and-forth about removing keys would only make sense if you’re on the edge of the database equivalent of the production possibility frontier and expecting to move well beyond that point very soon. I’m not sure how well that describes the average company, but it’s a side quibble.

Comments closed

Aggregations in Power BI Desktop

Jeroen ter Heerdt tries out aggregations in Power BI Desktop:

Aggregations bring me back to the good old SSAS Multidimensional days. The days that I invariably built the aggregations tree the wrong way around, SSAS would complain, I would scream and eventually give in.

You can imagine that I was curious but skeptical when I tried aggregations in Power BI Desktop. I was afraid of ending up in the same hate-but-need relationship that I had with SSAS multidimensional when it came to aggregations.

The good news? It is not like that. At all. Once you have aggregations working, they are great.

The not so good news? It took me longer than I am willing to admit getting them working – primarily due to data types and creation of the aggregated table. More details in this blog post.

Read on for the full story.

Comments closed

Tempdb Issues You Might Have

Fabiano Amorim walks through a few of the tricky issues around tempdb:

One of many performance improvements that came with SQL Server 2014 is that it doesn’t flush dirty pages created in a minimally logged operation on tempdb. This gives you the benefit of having faster (compared to prior versions) inserts, but it caused another problem as those allocated pages may take a lot of time to be removed from the buffer pool data cache. Before discussing the problem, quickly look at the benefit and then understand some important concepts of flush dirty pages on tempdb.

Click through to learn more about the fix, and then a fix to the fix…which opened up a new avenue to fix. This kind of thing is why operating at scale is so difficult: the solution to one problem often becomes the avenue to a new problem.

Comments closed

Serialization in Apache Flink

Nico Kruber walks us through the viable set of serializers in Apache Flink:

Flink handles data types and serialization with its own type descriptors, generic type extraction, and type serialization framework. We recommend reading through the documentation first in order to be able to follow the arguments we present below. In essence, Flink tries to infer information about your job’s data types for wire and state serialization, and to be able to use grouping, joining, and aggregation operations by referring to individual field names, e.g. stream.keyBy(“ruleId”) or dataSet.join(another).where("name").equalTo("personName"). It also allows optimizations in the serialization format as well as reducing unnecessary de/serializations (mainly in certain Batch operations as well as in the SQL/Table APIs).

Click through for notes on each serializer and a graph which shows how the choice of a serializer can make a huge difference.

Comments closed

Execution Plan Training, in Video Form

Hugo Kornelis makes an announcement:

As those who have been to my full-day precon on execution plans know, I believe that learning to understand execution plans does not start with dozens of examples. It starts with an explanation of the basics, followed by an overview of operators. Just like learning Russian doesn’t start with reading Tolstoy’s Война и мир (War and Peace), but with learning the grammar rules and the vocabulary.

Once you know the grammar of a language, and enough of its vocabulary, you can then pick up any book. And the more you do that, the easier it becomes. Eventually, one day, you will be able to read Война и мир in its original language.

And once you know the basics of reading execution plans, and are familiar with most of the operators, you will be able to tackle any execution plan you find on your servers, no matter how complex.

And, at least for now, this is free. So check out what Hugo has already and pass along a “thank you” if you like what you see there.

Comments closed

Parameterization and Enumerable.Contains() with EF Core 3

Erik Ejlskov Jensen explains how we can prevent Entity Framework Core 3 from polluting the plan cache if we use Enumerable.Contains():

One of the many advantages of using a tool like Entity Framework Core is, that you are sure that the framework will generate properly parameterized SQL for you. This helps avoid SQL injection issues and avoids plan cache pollution. Unfortunately, EF Core currently falls short on that promise, when translating queries, where you supply a list of values to be matched against a column – Enumerable.Contains method – this is translated to a SQL Server IN operator

Click through for a setup of the problem as well as the solution.

Comments closed