SSAS And Power BI Performance Issue

Chris Webb describes an issue with SSAS Multidimensional and Power BI-generated DAX causing a performance problem:

This query has something in it – I don’t know what – that means that it cannot make use of the Analysis Services Storage Engine cache. Every time you run it SSAS will go to disk, read the data that it needs and then aggregate it, which means you’ll get cold-cache performance all the time. On a big cube this can be a big problem. This is very similar to problems I’ve seen with MDX queries on Multidimensional and which I blogged about here; it’s the first time I’ve seen this happen with a DAX query though. I suspect a lot of people using Power BI on SSAS Multidimensional will have this problem without realising it.

This problem does not occur for all tables – as far as I can see it only happens with tables that have a large number of rows and two or more hierarchies in. The easy way to check whether you have this problem is to refresh your report, run a Profiler trace that includes the Progress Report Begin/End and Query Subcube Verbose events (and any others you find useful) and then refresh the report again by pressing the Refresh button in Power BI Desktop without changing it at all. In your trace, if you see any of the Progress Report events appear when that second refresh happens, as well as Query Subcube Verbose events with an Event Subclass of Non-cache data, then you know that the Storage Engine cache is not being used.

This doesn’t look to be a quick fix, so do read the whole thing to help figure out how to avoid this issue.

Query Performance Insight

Arun Sirpal discusses Query Performance Insight in Azure SQL Databases:

Here you will be presented with the TOP X queries based on CPU, Duration or Execution count. You will have the ability to change the time period of analysis, return 5, 10 or 20 queries using aggregations SUM, MAX or AVG.

So let’s look at what information is provided based on queries with high AVG duration over the last 6 hours.

Looks like an interesting way to get information on the few most heavily used queries.

Chaos Sloth

Erik Darling has created a script to make your servers go slow:

It randomly generates values and changes some important configuration settings.

  • Max Degree of Parallelism
  • Cost Threshold
  • Max Memory
  • Database compatibility level

This was written for SQL Server 2016, on a box that had 384 GB of RAM. If your specs don’t line up, you may have to change the seed values here. I’m not putting any more development into this thing to automatically detect SQL version or memory in the server, because this was a one-off joke script to see how bad things could get.

How bad did they get? The server crashed multiple times.

Not for production purposes.  Or maybe any purposes…

Regular Expressions Against Large Data Sets

Liz Bennett explains types of regular expressions which do not scale:

With recursive backtracking based regex engines, it is possible to craft regular expressions that match in exponential time with respect to the length of the input, whereas the Thompson NFA algorithm will always match in linear time. As the name would imply, the slower performance of the recursive backtracking algorithm is caused by the backtracking involved in processing input. This backtracking has serious consequences when working with regexes at a high scale because an inefficient regex can take orders of magnitude longer to match than an efficient regex. The standard regex engines in most modern languages, such as Java, Python, Perl, PHP, and JavaScript, use this recursive backtracking algorithm, so almost any modern solution involving regexes will be vulnerable to poorly performing regexes. Fortunately, though, in almost all cases, an inefficient regex can be optimized to be an efficient regex, potentially resulting in enormous savings in terms of CPU cycles.

There’s a significant performance difference, so if you work frequently with regular expressions, check this out.


Erin Stellato hacks DBCC CLONEDATABASE and makes it that much more powerful:

Very often when I mention testing before an upgrade, I’m told that there is no environment in which to do the testing.  I know some of you have a Test environment. Some of you have Test, Dev, QA, UAT and who knows what else. You’re lucky.

For those of you that state you have no test environment at all in which to test, I give you DBCC CLONEDATABASE. With this command, you have no excuse to not run the most frequently-executed queries and the heavy-hitters against a clone of your database. Even if you don’t have a test environment, you have your own machine.  Backup the clone database from production, drop the clone, restore the backup to your local instance, and then test.  The clone database takes up very little space on disk and you won’t incur memory or I/O contention as there’s no data.  You will be able to validate query plans from the clone against those from your production database. Further, if you restore on SQL Server 2016 you can incorporate Query Store into your testing! Enable Query Store, run through your testing in the original compatibility mode, then upgrade the compatibility mode and test again. You can use Query Store to compare queries side by side! (Can you tell I’m dancing in my chair right now?)

Erin’s discovery makes CLONEDATABASE go from being an interesting tool to being outright powerful for handling upgrades.

Explaining RBAR

Kenneth Fisher explains RBAR with the help of an animated GIF:

So 23 milliseconds for the batch version and 850 milliseconds for RBAR. What a difference.

Now in this case the code for the RBAR is also a lot more complicated. But that isn’t always the case. It also isn’t always the case that RBAR is slower. But it’s almost always a lot slower than batch.

So, while the code for RBAR is often easier to write, even though it might be physically longer, it’s probably going to be slower too.

Well-written, set-based solutions aren’t always guaranteed to be faster, but that’s one of the safest bets to make with T-SQL.

Learning Performance Tuning

Kendra Little gives some tips on how to gather performance tuning skills:

Start writing queries that demonstrate TSQL anti patterns – and make them slow

You know how people say that the best way to learn something is to teach it?

The best way to learn to speed up queries is to write slow ones.

The best way to get a job speeding up queries is to write a blog about the queries you’ve sped up.

This is a long-term learning process, but is absolutely a worthwhile skill for any database professional.

Feed The CPUs

SQL Sasquatch is starting a new series on optimizing disk write to maximize CPU throughput:

When I work with SQL Server batch-controlled workflows, I use the theory “feed the CPUs”.  That’s the simplest positive adaptation I could come up with of Kevin Closson’s paradigm “Everything is a CPU problem” 🙂

What I mean by “Feed the CPUs” is that memory and disk response times are primary factors determining the maximum rate for the CPUs to process the data.  Nuts & bolts of such a model for SQL Server are slightly different than a similar model for Oracle.  SQL Server access to persistent data is always through database cache, while Oracle uses shared access to database cache in SGA and private access to persistent data through direct access in PGA.

Click through for more details.

Table-Valued Parameter Performance

Dan Guzman shows that setting TVP column sizes correctly can have a major performance impact:

LOB values are especially problematic when a trace captures the RPC completed event of a TVP query. Tracing uses memory from the OBJECTSTORE_LBSS memory pool to build trace records that contain TVP LOB values. From my observations of the sys.dm_os_memory_clerks DMV, each LOB cell of a TVP requires about 8K during tracing regardless of the actual value length. This memory adds up very quickly when many rows and lob columns are passed via a TVP with a trace running. For example, the 10,000 row TVP with 10 LOB columns used in the earlier test required over 800MB memory for a single trace record. Consider that a large number of TVP LOB cells and/or concurrent TVP queries can cause queries to fail with insufficient memory errors. In extreme cases, the entire instance can become unstable and even crash under due to tracing of TVP queries.

This is a must-read if you use TVPs in your environment.

Parallel Insert-Select

Arvind Shyamsundar looks at parallel insertion using the INSERT SELECT pattern:

For row store targets, it is important to note that the presence of a clustered index or any additional non-clustered indexes on the target table will disable the parallel INSERT behavior. For example, here is the query plan on the same table with an additional non-clustered index present. The same query takes 287 seconds without a TABLOCK hint and the execution plan is as follows

This post goes into detail on when you can expect parallelism in rowstore and columnstore insertions.  I highly recommend reading it.


June 2017
« May