Aggregate Pushdown with GROUP BY

Paul White takes us through several performance improvements around aggregate pushdown:

SQL Server 2016 introduced serial batch mode processing and aggregate pushdown. When pushdown is successful, aggregation is performed within the Columnstore Scan operator itself, possibly operating directly on compressed data, and taking advantage of SIMD CPU instructions.

The performance improvements possible with aggregate pushdown can be very substantial. The documentation lists some of the conditions required to achieve pushdown, but there are cases where the lack of ‘locally aggregated rows’ cannot be fully explained from those details alone.

This article covers additional factors that affect aggregate pushdown for GROUP BY queries onlyScalar aggregate pushdown (aggregation without a GROUP BY clause), filter pushdown, and expression pushdown may be covered in a future post.

Read the whole thing.

Power Query Container Size and Performance

Chris Web looks into what changing the Power BI Dataflow container size does for us:

Currently there is no way to change this 256MB in Power BI Desktop or Excel although someone has already posted a suggestion on the Ideas site to allow us to change it. How much of an impact does this actually have on refresh performance though? Without the ability to change this setting it’s hard to say, but I suspect it could be significant and that a lot of Power Query performance problems could be explained by this behaviour.

The situation is different in the Power BI service, where I understand there is a limit on the overall amount of memory that a single Power Query query evaluation can use.

Read on to understand the differences here between running on Power BI Desktop and running in the Power BI service, as well as a bit of testing on Chris’s part.

Load Testing Tools For SQL Server

Brent Ozar shares a list of load testing tools for SQL Server:

One thing I need you to understand first: you have to provide the database and the queries. Almost all of the tools in this post, except the last one, are designed to help you run queries, but they don’t include the queries. The whole idea with load testing is that you’re trying to mimic your own workloads. If you’re just trying to test a server with generic workloads, start with my post, “How to Check Performance on a New SQL Server.”

Click through for a list of tools. I’d also throw in Pigdog from Mark Wilkinson (one of my co-workers). This helped replicate a few issues in SQL Server 2017 around tempdb performance.

Troubleshooting Spark Performance

Bikas Saha and Mridul Murlidharan explain some of the basics of performance tuning with Apache Spark:

Our objective was to build a system that would provide an intuitive insight into Spark jobs that not just provides visibility but also codifies the best practices and deep experience we have gained after years of debugging and optimizing Spark jobs. The main design objectives were to be
– Intuitive and easy – Big data practitioners should be able to navigate and ramp quickly
– Concise and focused – Hide the complexity and scale but present all necessary information in a way that does not overwhelm the end user
– Batteries included – Provide actionable recommendations for a self service experience, especially for users who are less familiar with Spark
– Extensible – To enable additions of deep dives for the most common and difficult scenarios as we come across them

The tool looks pretty interesting and I’m hoping it will be part of the open source suite at Cloudera.

Limitations on Inlining Functions

Erik Darling takes us through a limitation to SQL Server 2019 scalar function inlining:

There’s a lot of excitement (alright, maybe I’m sort of in a bubble with these things) about SQL Server 2019 being able to inline most scalar UDFs.

But there’s a sort of weird catch with them. It’s documented, but still.

If you use GETDATE in the function, it can’t be inlined.

GETDATE() and its bretheren are non-deterministic so I figured that would be an issue. Check out the documentation for the other limitations.

Understanding Key Lookups

Monica Rathbun explains what a key lookup is in SQL Server:

One of the easiest things to fix when performance tuning queries are Key Lookups or RID Lookups. The key lookup operator occurs when the query optimizer performs an index seek against a specific table and that index does not have all of the columns needed to fulfill the result set. SQL Server is forced to go back to the clustered index using the Primary Key and retrieve the remaining columns it needs to satisfy the request. A RID lookup is the same operation but is performed on a table with no clustered index, otherwise known as a heap.  It uses a row id instead of a primary key to do the lookup.

As you can see these can very expensive and can result in substantial performance hits in both I/O and CPU. Imagine a query that runs thousands of times per minute that includes one or more key lookups. This can result in tremendous overhead which is generated by these extra reads it effects the overall engine performance.

Monica’s absolutely right: key lookups can take a decent query and make it into a performance hog.

Power BI: Comparing Web.Contents and File.Contents Performance

Chris Webb compares the performance of File.Contents and Web.Contents:

In my last post I mentioned the Power Query engine’s persistent cache, which in some scenarios caches the data read from a data source when a query is refreshed. Another important nugget of information that Ehren von Lehe of the Power Query dev team mentioned in a post on the Power Query MSDN forum recently is the fact that if you use File.Contents to get data from a file then the persistent cache is not used, but if you use Web.Contents to get data from the same file then the persistent cache is used. I guess the thinking here is that there is no point creating an on-disk cache containing the contents of a file that is already on disk.

Chris takes us through a couple of unexpected twists, so check it out.

Getting the Last Actual Execution Plan for a Query

Pedro Lopes takes us through a new dynamic management function in SQL Server 2019 CTP 2.4:

In CTP 2.4 not all actual execution plans will be available, you can see more details on that here.

For an upcoming CTP version, all queries will be available with the equivalent of the actual execution plan. At least those where the plan was cached in the first place, or those where the plan has not been evicted from cache.

That caveat aside, I’m happy with this.

Important Considerations with Indexed Views

Jason Brimhall takes us through a few important considerations with indexed views:

The creation of this view has chewed up a bunch of storage. It has jumped right up to the number two spot on the biggest objects list within this database. You can see that differences by comparing the highlighted rows to the previous image. The vPerson view is highlighted in red in this second image to help point it out quickly.

Surely this must be a contrived example and people don’t really do this in the real world, right? The answer to that is simply: NO! It DOES happen. I see situations like this all too often. Far too often, large text fields are added to an indexed view to make retrieval faster. I have mimicked that by adding in two XML columns from the Person.Person table. This is definitely overkill because a simple join back to the table based on the BusinessEntityID would get me those two columns. All I have effectively done is duplicated data being stored and I have achieved that at the low low cost of increased storage of 25% for this small database. If you are curious, the column count between the Person.Person table and this new view is 13 columns each.

Jason takes us through a couple more gotchas and provides some important advice you should follow if you think indexed views might be a fit for you.

Creating Threadpools with ExecutorService in Kafka

Prasanth Nair shows how we can use Java’s ExecutorService to create threadpools for Kafka consumers:

Apache Kafka is one of today’s most commonly used event streaming platforms. While using the Kafka platform, quite often, we run into a scenario where we have to process a large number of events/messages that are placed on a broker. Traditional approaches, where a consumer is listening to a topic and then processes these message within the consumer itself, can become a performance bottleneck if the number of messages being placed on the topic is high. In such cases, the rate at which a consumer can process messages will be very low, as there are a large number of messages getting placed on the topic. A potential solution that can be applied in such a scenario is to offload message processing to the worker threads in a thread pool.

Click through for the Java code.


June 2019
« May