Press "Enter" to skip to content

Category: Performance Tuning

Incremental Refresh on Large Power BI Semantic Models

Soheil Bakhshi needs to refresh a lot of data:

Implementing incremental refresh on Power BI is usually straightforward if we carefully follow the implementation steps. However in some real-world scenarios, following the implementation steps is not enough. In different parts of my latest book, Expert Data Modeling with Power BI, 2’nd Edition, I emphasis the fact that understanding business requirements is the key to every single development project and data modelling is no different. Let me explain it more in the context of incremental data refresh implementation.

Read on for that explanation, as well as a few tips to make things work a bit more smoothly.

Leave a Comment

Relationship Columns and Power BI DirectQuery Mode

Chris Webb builds a relationship:

Many Power BI connectors for relational databases, such as the SQL Server connector, have an advanced option to control whether relationship columns are returned or not. By default this option is on. Returning these relationship columns adds a small overhead to the time taken to open a connection to a data source and so, for Power BI DirectQuery semantic models, turning this option off can improve report performance slightly.

Read on to learn what these relationship columns are and how you can remove them. Chris also provides a first-order approach to how you can estimate the performance pain involved with including these.

Leave a Comment

An Overview of Data Partitioning Strategies

thanhdoancong (there are spaces in there somewhere but I’d probably guess wrong) talks partitions:

Data partitioning is the magic wand that divides your massive dataset into smaller, organized subsets called partitions. These partitions are based on specific criteria, like date ranges, customer segments, or product categories.

It’s like organizing your overflowing closet by color, season, or type of clothing. Each section becomes easier to browse and manage, making life (and data analysis) much easier.

Read on for a few varieties of partitioning and how they could improve your data estate. There’s no guarantee that partitioning will definitely improve performance—and in SQL Server’s case, the partitioning feature often does not improve performance at all because that isn’t its intent—but this is a good read to get an idea of what strategies are available.

Leave a Comment

Measuring Query Times in Power BI DirectQuery Mode

Chris Webb breaks out the stopwatch:

If you’re tuning a DirectQuery semantic model in Power BI one of the most important things you need to measure is the total amount of time spent querying your data source(s). Now that the queries Power BI generates to get data from your source can be run in parallel it means you can’t just sum up the durations of the individual queries sent to get the end-to-end duration. The good news is that there are new traces event available in Log Analytics (though not in Profiler at the time of writing) which solves this problem.

Read on to learn more about this event.

Leave a Comment

Measuring Write Speeds in SQL Server

Vlad Drumea performs a test:

In this post I cover a script I’ve put together for measuring storage write speeds in SQL Server, namely against database data files.

This is meant to help get an idea of how the underlying storage performs when SQL Server is writing 1GB of data to a database.

At this point, you might be asking yourself: “Why not use CrystalDiskMark instead?”.
The answer is simple: you might not always be able to install/run additional software in an environment. Even more so if you work with external customers or you’re a consultant. It’s a lot simpler to ask a customer to run a script and send you the output, than it is to ask them to install and run some 3rd party software.

Click through for the script, what it does, and how to run it, as well as a note on limitations and example based on three drives.

Leave a Comment

dataConvergeDefinition and DirectQuery Partitions

Chris Webb talks about hybrid tables:

Hybrid tables – tables which contain both Import mode and DirectQuery mode partitions to hold data from different time periods – have been around for a while. They are useful in cases where your historic data doesn’t change but your most recent data changes very frequently and you need to reflect those changes in your reports; you can also have “reverse hybrid tables” where the latest data is in Import mode but your historic data (which may not be queried often but still needs to be available) is in DirectQuery mode. Up to now they had a problem though: even when you were querying data that was in the Import mode partition, Power BI still sent a SQL query to the DirectQuery partition and that could hurt performance. That problem is now solved with the new dataCoverageDefinition property on the DirectQuery partition.

Read on to see what dataCoverageDefinition does.

Comments closed

Improving Aurora Postgres Performance by Lowering random_page_cost

Shayon Mukherjee does some performance tuning:

Recently I have been working with some queries in Postgres where I noticed either it has decided not to use an index and perform a sequential scan, or it decided to use an alternative index over a composite partial index. This was quite puzzling, especially when you know there are indexes in the system that can perform these queries faster.

So what gives?

After some research, I stumbled upon random_page_cost (ref).

Read on to learn more about what this is, how you can change it, and a warning before you go wild in changing it.

Comments closed

Performance Costs of using Calculated Columns in Power BI Composite Models

Chris Webb share a warning:

I don’t have anything against the use of calculated columns in Power BI semantic models in general but you do need to be careful using them with DirectQuery mode. In particular when you have a DirectQuery connection to another Power BI semantic model – also known as a composite model on a Power BI semantic model – it’s very easy to cause serious performance problems with calculated columns. Let’s see a simple example of why this is.

Read on for Chris’s example.

Comments closed