Press "Enter" to skip to content

Category: Performance Tuning

PyODBC vs C# ODBC Performance Differences

Jose Manuel Jurado Diaz explains a performance difference:

A customer asked today, why using ODBC Driver 17 for SQL Server in Python with PYODBC we have a slightly difference in terms of time taken if we compare with C# System.Data.Odbc. Following, I would like to share my lesson learned about it.

Read on for Jose’s explanation. My short version is, it seems particularly important when using the Python ODBC driver to write the exact query you want rather than a SELECT * or query which returns rows/columns you don’t need.

Comments closed

Entity Framework and Include Operations

Josh Darnell has a warning:

I can imagine someone reading that and not seeing the gravity of the situation. “Hey, 500 rows isn’t that many – we have modern hardware!”

I thought it was worth writing about a real world situation where this can get seriously out of hand.

Read on for a scenario in which 64 rows turns into 100,000 rows pretty quickly.

Comments closed

The User Transaction Scope for Temporary Objects

Bob Dorr troubleshoots a performance problem:

When the temporary table is bound to the user transaction it is both created and destroyed as part of the transaction.  The same logic in a procedure attempts to avoid the creation and destruction, for each execution, by using temporary table caching.

From the issue I was debugging, the user transaction scope mattered because creation and destruction of metadata may be an expensive operation.  

This post ties into two separate things: first, how temp objects tie to specific sessions; and second, the cost of creating and destroying temporary objects. For the latter, a couple quick pieces of advice:

  • Reduce the number of temporary objects you create. If you can solve a problem with fewer temp tables or table variables while maintaining acceptable performance, that can help on busy systems.
  • Never explicitly drop temp tables. There’s no benefit to explicitly dropping temp tables, as they’ll go away as soon as the session ends. Also, not dropping temp tables is the first step to:
  • Embrace temp table reuse. There are specific rules around when you can re-use a temp table. Each re-use of a temp table means two fewer metadata operations (one delete and one create).
  • Use memory-optimized table variables instead of temp tables or table variables.
  • Turn on memory-optimized tempdb metadata. The biggest issue here is that you lose cross-database queries into tempdb views. That can end up being painful and is why I can’t recommend it as a general solution.

Comments closed

Azure Redis Tips

Arun Sirpal enumerates some advice:

My learnings on Redis thus far which you may find useful:

1. Location of Redis should be close to your app.

2. Data structures within Redis, larger key value sizes lead to fragmentation of memory space and these larger memory requirements means more network data transfer, Redis states to use 100KB maximum, this will affect the transfer time allocated from the app. It could time out if the data request is big.

Click through for the rest of Arun’s advice. My advice on the 100KB maximum is that it really should be closer to 100 bytes or 1KB max in practice, especially for storing data which differs by entity (user, customer, organization, whatever your domain uses).

Comments closed

Measure Explosion and Optimization with Tabular Editor 3

Matt Allington pulls out the dynamite:

OK, so what’s the problem? The problem is that sometimes you get such a deep, nested path of measures that the underlying formula is more complex than it needs to be, and it is virtually impossible to debug. If you simply explode/expand a final measure into the full, underlying DAX code, it makes it much easier to see what is going on. Returning to the DAX example above, this is what the third measure actually looks like under the hood, removing all references to the dependent measures

Click through for an article and a video.

Comments closed

Using the Azure Synapse Analyzer Report

Sanjay Raut introduces an interesting report:

The Azure Synapse Analyzer Report was created to help you identify common issues that may be present in your database that can lead to performance issues. This report focuses on known best practices that Microsoft has identified with SQL Dedicated Pools. Following these best practices will help to get the best performance out of your solution.

One thing I appreciate about this is that it covers many items which people don’t know to think about when moving over from SQL Server or Azure SQL Database.

Comments closed

The Practical Costs of Index Fragmentation

Tibor Karaszi digs into index performance:

See numbers and diagrams at the end, or at the top. I measured a few cases: the difference between no external fragmentation and severe external fragmentation (over 99%). I have both a narrow index and a wide index, and I read one (1), 10,000 and 100,000 rows using index searches (“range scan”). There were obviously no difference reading 1 row so I exclude that from my discussion below. For the other cases the extra time with an extreme level of external fragmentation is (from lowest impact to highest) 7%, 10%, 13% and 32%. The highest number (32%) is when reading many rows from a narrow index, i.e. many rows per page. Again, this is with an extreme level of fragmentation.

What’s interesting is that for the most part, there’s a negligible difference between ~0% internal fragmentation and ~99% internal fragmentation. The follow-on question is, how much are defrag operations costing you in performance and when is the benefit worth the cost?

Comments closed

Query Performance Insights on the Serverless SQL Pool

Jovan Popovic shows how you can use the QPI library on an Azure Synapse Analytics serverless SQL pool:

You can find more of the best practices here. These best practices are very important because some issues might cause performance degradation. You might be surprised how applying some of these best practices might improve the performance of your workload.

The last item that is related to schema optimization is sometimes hard to check. You would need to look at your schema, inspect all columns and find what to optimize. If you have a large schema, this might not be an easy task. But you can make your life easier if you use the QPI helper library that can detect schema issues for you.

Read on to see what it can find.

Comments closed

Bucketing Data in Hive

Chitra Sapkal explains why bucketing in Hive can be so powerful:

When a column has a high cardinality, we can’t perform partitioning on it. A very high number of partitions will generate too many Hadoop files which would increase the load on the node. That’s because the node will have to keep the metadata of every partition, and that would affect the performance of that node

In simple words, You can use bucketing if you need to run queries on columns that have huge data, which makes it difficult to create partitions.

Click through to see how bucketing works and examples of how you can use it to make queries faster.

Comments closed

Improving Managed Instance Load Performance

Niko Neugebauer provides some tips on improving data load performance when using SQL Managed Instances in the General Purpose tier:

In this blog post we shall consider some of the strategies for improving data loading performance in Azure SQL Managed Instance. These strategies apply to the repeatable ETL processes, meaning that if there a problem, data loading process can be repeated without loss of any bit of data or its consistency. Of course, these strategies do require a prepared design for the possibility of the repetition, but this is a basic requirement for any data loading process.

There are many great ways of ensuring high performance data loading and this blog post does not pretend to be exhaustive an ultimate resource, but it will provide a couple of known paths that can drastically improve the performance of the Log Throughput in Azure SQL MI.

Click through for some tips, including some which make sense on-prem and a couple which are specific to platform-as-a-service offerings.

Comments closed