Press "Enter" to skip to content

Category: Performance Tuning

Performance Overhead of TDE

Matthew McGiffen answers an age-old question:

Microsoft states that enabling TDE usually has a performance overhead of 2–4%. That doesn’t sound like very much, and personally I wouldn’t let it bother me if I want to make sure my data is encrypted at rest. However, you may have heard other sources saying that it’s actually a lot more than that – and the performance impact is a high price to pay for the level of protection offered. So, what’s the truth?

It turns out the answer is a bit more complex than simply saying “x%,” though as a first approximation, I’d still say that the 2-4% is a good starting point. For what would move you off of that 2-4%, read the whole thing.

Comments closed

Optimizing Text Search in DAX

Marco Russo and Alberto Ferrari prime the pump:

When you import a table in Power BI, all the strings contained in a text column are stored in a dictionary, which improves the compression and provides excellent query performance when there is a filter with an exact match for the column value. However, reports that apply complex filters on a text column may have performance issues when the dictionary has a large number of values: depending on many other variables, a column with a few thousand unique values might already present a bottleneck, and this is definitely an issue when there are hundreds of thousands of unique strings in a column.

In October 2022, there was an internal optimization in Power BI that has improved the performance of these searches by creating an internal index. Chris Webb described this optimization in his article, Text search performance in Power BI. In this article, we explore how to evaluate whether the optimization is applied and how to measure any performance improvements. As usual, everything comes at a price: creating the index has a cost, that you will see applied to the first query hitting the column. We will also see how to detect this event and the existing limitations for this optimization.

Click through for their deep dive into the process. The final answer reminds me of the warehousing world, where you might pre-run some important queries to get those pages into the buffer pool and available for later reports.

Comments closed

Speeding Up a Slow Kafka Consumer with Parallelism

Paul Brebner continues a series on Kafka consumers:

In Part 1 of this series, we had a look at Kafka concurrency and throughput work, recapped some earlier approaches I used to improve Kafka performance, and introduced the Kafka Parallel Consumer and supported ordering options (Partition, Key, and Unordered). In this second part we continue our investigations with some example code, a trace of a “slow consumer” example, how to achieve 1 million TPS in theory, some experimental results, what else do we know about the Kafka Parallel Consumer, and finally, if you should use it in production. 

Read on to see what Paul has to say about the topic.

Comments closed

Trying Query Parameterization Settings in SQL Server

Tibor Karaszi builds a test:

You have probably seen the recommendation to turn on the “optimize for ad-hoc workloads” setting. You might even have seen a more recent recommendation to set the database setting parameterization to forced (instead of the default which is simple). The aim of this post is to briefly describe each and then do some test with various settings.

Click through for that test. This is a good example of how we need to temper guidance with context. In Tibor’s scenario, forced parameterization is a no-brainer and optimize for ad hoc workloads gives a pretty nice reduction in plan cache utilization. But then, with optimize for ad hoc workloads on, you lose the ability to see the first run of a query in Query Store and lose the opportunity to tune the different variations of a query which only ran once. Pretty much every setting in SQL Server exists because there is a scenario in which that is the most appropriate setting. Except auto-shrink. Auto-shrink delenda est.

Comments closed

Measuring Power BI Dataset Memory and CPU Utilization

Chris Webb checks resource utilization:

This post is a follow-up to my recent post on identifying CPU and memory-intensive Power Query queries in Power BI. In that post I pointed out that Profiler and Log Analytics now gives you information on the CPU and memory used by an individual Power Query query when importing data into Power BI. What I didn’t notice when I wrote that post is that there is also now information available in Profiler and Log Analytics that tells you about peak memory and CPU usage across all Power Query queries for a single refresh in the Power BI Service, as well as memory usage for the refresh as a whole.

Click through for a demonstration using Profiler.

Comments closed

Which Power Query Operations Are Most Resource-Intensive?

Chris Webb answers a question:

Last year I wrote a post about a change in the Power BI Service that meant the CPU Time associated with Power Query transformations was added to the total shown for a dataset refresh operation in Profiler traces and Log Analytics:

https://blog.crossjoin.co.uk/2022/07/03/measuring-power-query-cpu-usage-during-power-bi-dataset-refresh/

This was useful, but it didn’t tell you directly how much CPU Time was used by Power Query and it didn’t tell you which tables or partitions in a refresh were using the most CPU. It also didn’t tell you anything about Power Query memory usage. The good news that recently there has been another change that solves these problems.

Click through for the solution.

Comments closed

Troubleshooting and Fixing Grafana Performance Problems

Cameron Kerr tracks down dashboard problems:

Is your Prometheus performance causing your Grafana experience to suffer? Could you explain which of your tens of Grafana dashboards or users are causing performance problems in Prometheus? Let me show you how we unlocked insight into the performance of Grafana and Prometheus, allowing us to answer import service management questions such as:

  • Which dashboards are actually being used?
  • Who is actually using the dashboards?
  • Who has been issuing poorly-performing queries?
  • And which dashboard activity is causing my Prometheus instance(s) to perform poorly?
  • Is it slow because someone has a dashboard showing a year’s worth of data that auto-refreshes every 30 seconds?

Click through for a thorough article on finding performance bottlenecks in a platform which typically helps you find performance bottlenecks in other platforms.

2 Comments

Tips for Scaling Cassandra Clusters

Mario Tavares wants more zoom:

When the use case aligns with the architectural limitations, Cassandra excels at storing and accessing datasets up to petabytes in volume, delivering impressive throughput. As the data or workload volume grows, we expand the cluster linearly, ensuring consistent performance.

However, even when we adhere to the documentation and best practices and create an effective data model, we might encounter underperforming nodes or unexpected challenges with throughput scaling after a cluster expansion—and it’s not always clear what causes the imbalance. Linear scalability relies on the assumption that workload and data are evenly distributed across all nodes in a cluster, and the cluster capacity relates directly to the number of nodes. Sometimes, these conditions aren’t met, affecting linear scalability. So, we strive for scalability and balance and are willing to fulfill the necessary conditions.

Read on for a few common performance issues and what you can do about them.

Comments closed

Tips for Using a Data Lakehouse

James Serra shares some advice:

As I mentioned in my Data Mesh, Data Fabric, Data Lakehouse presentation, the data lakehouse architecture, where you use a data lake with delta lake as a software layer and skip using a relational data warehouse, is becoming more and more popular. For some customers, I will recommend “Use a data lake until you can’t”. What I mean by this is to take the following steps when building a new data architecture in Azure with Azure Synapse Analytics:

Click through for six notes.

Comments closed

Text Search Performance Optimization in Power BI

Chris Webb provides advice about a relatively new feature:

In the blog post for the October 2022 release of Power BI Desktop there’s a brief description of a new optimisation in Power BI for filters that perform searches in text columns. In this blog post I’m going to share more details about this optimisation and how you can make sure your reports can benefit from it.

Read on to learn a bit more about how it works, as well as a few tips on ensuring that you’re able to take full advantage of this optimization.

Comments closed