Performance Tuning – Page 20

An Index for Change Tracking Cleanup

Published 2022-05-20 by Kevin Feasel

If the issue persists and we stack a lot of data that needs to be removed which can be too much for autocleanup to manage. At that point, we recommend running (as is indicated in the error message) sp_flush_CT_internal_table_on_demand which is discusseed in detailed in Amit’s blog post Change Tracking Cleanup–Part 1.
I will not discuss the internals of the process as Amit Banerjee already touched on this in the referenced blog. However if you are routinely encountering issues where autocleanup can’t keep up and/or manual cleanup is taking significant time, you can add the index and see if it helps and I have the process enumerated below. Keep in mind this is not guaranteed to fix all cleanup issues and it may still be necessary to run manual cleanup regularly. If you continue to have issues with cleanup feel free to create a ticket with us here at CSS to assist you and get specific data to your environment.

Read on for the index definition as well as some important notes about whether you might need it.

Comments closed

Optimizing Hive Performance with Tez

Published 2022-05-11 by Kevin Feasel

Jay Desai has some recommendations around tuning Tez queries:

Tuning Hive on Tez queries can never be done in a one-size-fits-all approach. The performance on queries depends on the size of the data, file types, query design, and query patterns. During performance testing, evaluate and validate configuration parameters and any SQL modifications. It is advisable to make one change at a time during performance testing of the workload, and would be best to assess the impact of tuning changes in your development and QA environments before using them in production environments. Cloudera WXM can assist in evaluating the benefits of query changes during performance testing.

Click through for several configuration and query considerations.

Comments closed

Slow File Open Times in Power BI

Published 2022-05-02 by Kevin Feasel

Marco Russo explains why opening some Power BI files might take so long:

There could be many reasons for that, but if you have calculated columns and/or calculated tables in your model, you should be aware that they could be the reasons why this happens. It could be, so I want to explain when this happens.
The short explanation is the following: when you open a PBIX file, Power BI Desktop automatically recalculates those calculated columns and calculated tables that depend on a volatile formula.

Read on for the longer explanation, which includes a (possibly incomplete) list of volatile formulas.

Comments closed

PyODBC vs C# ODBC Performance Differences

Published 2022-04-25 by Kevin Feasel

Jose Manuel Jurado Diaz explains a performance difference:

A customer asked today, why using ODBC Driver 17 for SQL Server in Python with PYODBC we have a slightly difference in terms of time taken if we compare with C# System.Data.Odbc. Following, I would like to share my lesson learned about it.

Read on for Jose’s explanation. My short version is, it seems particularly important when using the Python ODBC driver to write the exact query you want rather than a SELECT * or query which returns rows/columns you don’t need.

Comments closed

Entity Framework and Include Operations

Published 2022-04-21 by Kevin Feasel

Josh Darnell has a warning:

I can imagine someone reading that and not seeing the gravity of the situation. “Hey, 500 rows isn’t that many – we have modern hardware!”
I thought it was worth writing about a real world situation where this can get seriously out of hand.

Read on for a scenario in which 64 rows turns into 100,000 rows pretty quickly.

Comments closed

The User Transaction Scope for Temporary Objects

Published 2022-04-21 by Kevin Feasel

Bob Dorr troubleshoots a performance problem:

When the temporary table is bound to the user transaction it is both created and destroyed as part of the transaction. The same logic in a procedure attempts to avoid the creation and destruction, for each execution, by using temporary table caching.
From the issue I was debugging, the user transaction scope mattered because creation and destruction of metadata may be an expensive operation.

This post ties into two separate things: first, how temp objects tie to specific sessions; and second, the cost of creating and destroying temporary objects. For the latter, a couple quick pieces of advice:

Reduce the number of temporary objects you create. If you can solve a problem with fewer temp tables or table variables while maintaining acceptable performance, that can help on busy systems.
Never explicitly drop temp tables. There’s no benefit to explicitly dropping temp tables, as they’ll go away as soon as the session ends. Also, not dropping temp tables is the first step to:
Embrace temp table reuse. There are specific rules around when you can re-use a temp table. Each re-use of a temp table means two fewer metadata operations (one delete and one create).
Use memory-optimized table variables instead of temp tables or table variables.
Turn on memory-optimized tempdb metadata. The biggest issue here is that you lose cross-database queries into tempdb views. That can end up being painful and is why I can’t recommend it as a general solution.

Comments closed

Azure Redis Tips

Published 2022-04-19 by Kevin Feasel

Arun Sirpal enumerates some advice:

My learnings on Redis thus far which you may find useful:
1. Location of Redis should be close to your app.
2. Data structures within Redis, larger key value sizes lead to fragmentation of memory space and these larger memory requirements means more network data transfer, Redis states to use 100KB maximum, this will affect the transfer time allocated from the app. It could time out if the data request is big.

Click through for the rest of Arun’s advice. My advice on the 100KB maximum is that it really should be closer to 100 bytes or 1KB max in practice, especially for storing data which differs by entity (user, customer, organization, whatever your domain uses).

Comments closed

Measure Explosion and Optimization with Tabular Editor 3

Published 2022-04-13 by Kevin Feasel

Matt Allington pulls out the dynamite:

OK, so what’s the problem? The problem is that sometimes you get such a deep, nested path of measures that the underlying formula is more complex than it needs to be, and it is virtually impossible to debug. If you simply explode/expand a final measure into the full, underlying DAX code, it makes it much easier to see what is going on. Returning to the DAX example above, this is what the third measure actually looks like under the hood, removing all references to the dependent measures

Click through for an article and a video.

Comments closed

Using the Azure Synapse Analyzer Report

Published 2022-04-12 by Kevin Feasel

Sanjay Raut introduces an interesting report:

The Azure Synapse Analyzer Report was created to help you identify common issues that may be present in your database that can lead to performance issues. This report focuses on known best practices that Microsoft has identified with SQL Dedicated Pools. Following these best practices will help to get the best performance out of your solution.

One thing I appreciate about this is that it covers many items which people don’t know to think about when moving over from SQL Server or Azure SQL Database.

Comments closed

The Practical Costs of Index Fragmentation

Published 2022-03-29 by Kevin Feasel

Tibor Karaszi digs into index performance:

See numbers and diagrams at the end, or at the top. I measured a few cases: the difference between no external fragmentation and severe external fragmentation (over 99%). I have both a narrow index and a wide index, and I read one (1), 10,000 and 100,000 rows using index searches (“range scan”). There were obviously no difference reading 1 row so I exclude that from my discussion below. For the other cases the extra time with an extreme level of external fragmentation is (from lowest impact to highest) 7%, 10%, 13% and 32%. The highest number (32%) is when reading many rows from a narrow index, i.e. many rows per page. Again, this is with an extreme level of fragmentation.

What’s interesting is that for the most part, there’s a negligible difference between ~0% internal fragmentation and ~99% internal fragmentation. The follow-on question is, how much are defrag operations costing you in performance and when is the benefit worth the cost?

Comments closed

Category: Performance Tuning