Performance Tuning – Page 29

Over the years, there’s been an extensive and continuous effort to improve Spark SQL’s query optimizer and planner in order to generate high-quality query execution plans. One of the biggest improvements is the cost-based optimization framework that collects and leverages a variety of data statistics (e.g., row count, number of distinct values, NULL values, max/min values, etc.) to help Spark choose better plans. Examples of these cost-based optimization techniques include choosing the right join type (broadcast hash join vs. sort merge join), selecting the correct build side in a hash-join, or adjusting the join order in a multi-way join. However, outdated statistics and imperfect cardinality estimates can lead to suboptimal query plans. Adaptive Query Execution, new in the upcoming Apache SparkTM 3.0 release and available in the Databricks Runtime 7.0 beta, now looks to tackle such issues by reoptimizing and adjusting query plans based on runtime statistics collected in the process of query execution.

One of the biggest advantages of SQL as a fourth-generation language is that the database engine (whether that be SQL Server, Oracle, or Spark) gets the opportunity to write and re-write the set of operations needed to solve a query to try to find the best path which returns the same result set. These optimizations aren’t perfect, as any query tuner can tell you, but they can go a long way.

Comments closed

dbatools Commands for Performance Tuning

Published 2020-05-22 by Kevin Feasel

John McCormack takes a look at dbatools with an eye on performance tuning:

DBATools is well known in the SQL Server community for the ease at which it allows you to automate certain tasks but did you know that DBATools can help with performance tuning your SQL Server. As my job recently changed to have more of a performance tilt, I wanted to investigate which commands would be useful in helping me with performance tuning. It turned out there are quite a few.

There are some good commands in here.

Comments closed

Tips to Improve Power BI Performance

Published 2020-05-20 by Kevin Feasel

Dan Szepesi has a few tips for improving Power BI performance:

Now that we have talked through the general Power BI system components, let’s talk performance! The scope of this blog will cover import models (where data is imported to Power BI Desktop and built into a data model) in the Power BI Pro service tier. Power BI Premium and Direct Query performance tuning will not be included in this blog post, but if there is interest in those areas, please let us know.
In part I of this performance series, we will look at improving performance in your model, the heart of an import Power BI report solution.

Read on for these tips.

Comments closed

Solving Last Page Contention Problems

Published 2020-05-18 by Kevin Feasel

Erik Darling ponders several techniques for solving the last page contention problem in SQL Server:

If one dare go looking, a list of Microsoft Approved™ solutions for dealing with last page contention exists.
I’m going to walk through my experiences with them in different scenarios, so you get a better understanding of which ones work best, and which ones can be tough to implement.

Read the whole thing.

Comments closed

Speeding Up Pivot Operations in Power Query

Published 2020-05-18 by Kevin Feasel

Imke Feldmann has a few tricks for making pivot operations in Power Query faster:

Pivot operations in are a very handy feature in Power Query but they can slow down refresh performance. So with some bittersweet pleasure I can tell that I found a trick to speed them up. The sweetness comes from the fact that the performance improvement is very significant. But the bitterness comes from the fact that I could have used this for almost 4 years now, but was too blind to realize at the time when I first worked with the code.

Read the whole thing and learn why Imke is disappointed in not knowing this four years ago.

Comments closed

Dynamic File Pruning on Delta Lake

Published 2020-05-06 by Kevin Feasel

Ali Afroozeh, et al, take us through Dynamic File Pruning in Databricks Runtime 6.1:

In addition to eliminating data at partition granularity, Delta Lake on Databricks dynamically skips unnecessary files when possible. This can be achieved because Delta Lake automatically collects metadata about data files managed by Delta Lake and so, data can be skipped without data file access. Prior to Dynamic File Pruning, file pruning only took place when queries contained a literal value in the predicate but now this works for both literal filters as well as join filters. This means that Dynamic File Pruning now allows star schema queries to take advantage of data skipping at file granularity.

There are some interesting performance results here. I’d also be curious to see how robust the results are as queries get more complicated

Comments closed

Optimizing Slow Card Visuals in Power BI

Published 2020-05-06 by Kevin Feasel

Marco Russo helps us tune Power BI reports containing a large number of card visuals:

Every visual element in a Power BI report must complete a number of tasks to provide the expected result. Visuals showing data must generate one or more DAX queries to retrieve the required measures applying the correct filters. The execution of these queries increases the waiting time for the end user, and increase the workload on the server, especially when multiple users access a published report at the same time. In order to improve the performance and the scalability of a report, the best practice is reducing the number of visuals consuming data published in a page of a report.
The focus is on a single page of the report. Power BI only gets data and build the visualizations required for the active page of a report. When the user switches the focus to a different page, the waiting time only depends on the visuals of the new page. The content of other pages of the same report is not relevant for the performance. The goal is reducing the number of visuals in a single page of a report. This could be challenging in order to obtain the same report layout, but we can look for the right visualization once we realize that the number of visuals in the same page is negatively affecting the user experience.

Less is more here.

Comments closed

Foreign Keys and Non-Changing Updates

Published 2020-04-30 by Kevin Feasel

Brent Ozar has a warning for us:

If you update a row without actually changing its contents, does it still hurt?
Paul White wrote in detail about the impact of non-updating updates, proving that SQL Server works hard to avoid doing extra work where it can. That’s a great post, and you should read it.
But foreign keys add another level of complexity. If you have foreign keys, and you tell SQL Server that you’re updating the contents of those related columns, SQL Server will check ’em even when the data isn’t changing.

Click through for the demonstration. I don’t think I agree with Brent’s dichotomy as laid out at the end of the post—the back-and-forth about removing keys would only make sense if you’re on the edge of the database equivalent of the production possibility frontier and expecting to move well beyond that point very soon. I’m not sure how well that describes the average company, but it’s a side quibble.

Comments closed

Aggregations in Power BI Desktop

Published 2020-04-27 by Kevin Feasel

Jeroen ter Heerdt tries out aggregations in Power BI Desktop:

Aggregations bring me back to the good old SSAS Multidimensional days. The days that I invariably built the aggregations tree the wrong way around, SSAS would complain, I would scream and eventually give in.
You can imagine that I was curious but skeptical when I tried aggregations in Power BI Desktop. I was afraid of ending up in the same hate-but-need relationship that I had with SSAS multidimensional when it came to aggregations.
The good news? It is not like that. At all. Once you have aggregations working, they are great.
The not so good news? It took me longer than I am willing to admit getting them working – primarily due to data types and creation of the aggregated table. More details in this blog post.

Read on for the full story.

Comments closed

Tempdb Issues You Might Have

Published 2020-04-22 by Kevin Feasel

Fabiano Amorim walks through a few of the tricky issues around tempdb:

One of many performance improvements that came with SQL Server 2014 is that it doesn’t flush dirty pages created in a minimally logged operation on tempdb. This gives you the benefit of having faster (compared to prior versions) inserts, but it caused another problem as those allocated pages may take a lot of time to be removed from the buffer pool data cache. Before discussing the problem, quickly look at the benefit and then understand some important concepts of flush dirty pages on tempdb.

Click through to learn more about the fix, and then a fix to the fix…which opened up a new avenue to fix. This kind of thing is why operating at scale is so difficult: the solution to one problem often becomes the avenue to a new problem.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Performance Tuning

Adaptive Query Execution with Spark SQL

dbatools Commands for Performance Tuning

Tips to Improve Power BI Performance

Solving Last Page Contention Problems

Speeding Up Pivot Operations in Power Query

Dynamic File Pruning on Delta Lake

Optimizing Slow Card Visuals in Power BI

Foreign Keys and Non-Changing Updates

Aggregations in Power BI Desktop

Tempdb Issues You Might Have