Press "Enter" to skip to content

Category: Performance Tuning

Understanding Worktables in SQL Server

Steve Stedman takes a peek at tempdb:

worktable in SQL Server is a temporary structure that the SQL Server Database Engine uses to process certain types of queries. These tables are not explicitly created by users but are generated by SQL Server internally to handle specific operations that cannot be managed directly within memory. Worktables are stored in the tempdb database and are crucial for facilitating complex query execution plans.

Read on for examples of when SQL Server will use worktables and some good ideas when you spot worktables in the wild. They’re not inherently bad, but there are some performance problems you could experience around them.

Comments closed

TempDB Contention and SQL Server 2022

Simon Liew shows off a change in SQL Server 2022:

Tempdb often acts as a high-traffic repository in SQL Server, experiencing significant contention. This includes not only temp table usage but also processes such as triggers, worktables for storing intermediate results for spools, cursors, sorts, work files for hash join, and temporary large object (LOB) storage, just to name a few.

A prominent issue is Global Allocation Map (GAM) and Shared Global Allocation Map (SGAM) system page latch contention, which can be particularly problematic under specific high concurrency workloads.

Read on for a demonstration of this.

Comments closed

Comparing Microsoft Fabric Warehouse and Lakehouse Performance

Reitse Eskens busts out the stopwatch:

I just can’t seem to stop doing this, checking the limits of Microsoft Fabric. In this instalment I’ll try and find some limits on the data warehouse experience and compare them with the Lakehouse experience. The data warehouse is a bit different compared to the Lakehouse, so I’ll be digging into that one first. Then I’m going to load data into the warehouse with a copy data pipeline followed by some big queries to test performance. The Fabric Capacity App will be used to check out the capacity necessary (or used for that matter).

As usual, I’m using the F2 capacity as it’s the one that should break the easiest. It’s also the cheapest one to run tests against and, as the capacity calculation isn’t dependent on the SKU (Stock Keeping Unit), you can easily translate to find out which capacity SKU will fit the workload. Remember that your workload will differ from the one shown in this blog. These tests are a comparison between the different offerings, something you could do for yourself. These blogs are a bit of a happy place as every option will get a good chance. In your work, your skills (and those of your co-workers) will be a major driver towards an option. Even if this offers the chance to learn something new!

Reitse focuses on ingesting and transforming data and the results were quite interesting.

Comments closed

Diagnosing DirectQuery Connection Limit Performance Problems

Chris Webb does a bit of sleuthing:

A few months ago I blogged about the new limits available for the Maximum Connections Per Data Source property in Premium and why the number of connections that a DirectQuery semantic model can open to a data source is so important for report performance. At that time, however, there was no way for you to know whether the performance of your reports was being affected by a lack of available connections. The good news is that, with the announcement this week of the new Execution Metrics event in Log Analytics and Profiler, you can now see when this is happening.

Read on for an illustration of the problem and how you can resolve it.

Comments closed

Third-Party Applications and Poor Database Design

Kevin Hill talks about a bugbear of mine:

As a SQL DBA, what do you do when a vendor application has performance problems that are code related?

Server settings don’t generally seem to be an issue.

Queries and vendor code…total hands off. I just point at code and say “There’s a great choice for optimizing in your next update!”

Indexes are the “Sticky Bits” in between client data and vendor code.

To an extent, I do feel for software vendors, who have to write software that works in a variety of environments with a variety of workloads. That can be a real challenge, especially because the people developing those databases don’t get to monitor them in real time and observe what’s going on until someone reports an issue.

That’s the empathy part. The other side of the coin is, there are a bunch of vendors who have garbage-tier database designs and awful queries. And as a user of this third-party application or a consultant trying to help users of the application, that can be frustrating. The reason is, like Kevin mentions, you really don’t want to go mucking with the queries or database design, because with my luck, the change I make to improve a query’s performance will affect a trigger nested three levels deep in some totally unrelated process that somehow manages the data integrity of the entire application.

Comments closed

Checking SSIS Package Performance

Andy Brownsword digs into SSISDB:

I’ve recently been reviewing SSIS packages to make some performance fixes and needed a way to validate the results of those changes. I thought I’d share the scripts as they may be useful for others.

Rather than relying on run times from the SQL Agent running the packages I wanted to dive deeper into the packages so took another approach.

Read on for a query that gets the information. You can also eyeball it in the SSIS reports if you have a small or consistent set of packages to run.

Comments closed

Query Store Size-Based Cleanup Performance Issues

Kendra Little has a public service announcement:

I’m a huge fan of SQL Server’s Query Store feature. Query Store collects query execution plans and aggregate query performance metrics, including wait stats. Having Query Store enabled makes troubleshooting performance issues such as bad parameter sniffing, much, much easier. Because Query Store is integrated into SQL Server itself, it also can catch query plans in a lightweight way that an external monitoring system will often miss.

When performance matters, it’s important to ensure that you’re managing Query Store so that Query Store cleanup does not run during high volume times. Query Store cleanup could slow your workload down significantly.

Read on for more information. I’d also like to plug qdstoolbox, an open-source solution some of my former colleagues worked on. This includes QDSCacheCleanup, which works considerably better than the built-in cleanup process.

Comments closed

Speeding up Databricks Lakehouse Queries with Redis

Drew Furgiuele has the need for speed:

Since compute and storage are now separated, this means that any time you want to work with your data, you need some form of compute engine that is capable of connecting to and reading your data from your storage locations. Compute engines vary, but one of the best is Apache Spark, which gives you a great distributed compute layer suitable for all sorts of workloads, whether they be analytical and ad-hoc queries, dashboard or BI workloads, data engineering related, or even data science or AI/ML use cases. It really can do it all, and it does it very well.

But what about use operational use cases? For instance: let’s say your Lakehouse is hosting some data that is critical to customer-facing systems that demand low-latency response times, such as real-time users lookups, API interfaces, or event-driven systems, sometimes the overhead required to take a query, schedule it, and run it can be in the hundreds of milliseconds. For some workloads, that’s a lifetime.

Read on to see how you can build a caching layer on top of certain lakehouse operations when some operation needs to be as fast as possible.

Comments closed

SQL Server Index Included Columns and Log Bloat

Forrest McDaniel explains that TANSTAAFL:

Let me share the conclusion now: included columns in an index always get included in the log record, even if they weren’t modified. The same is not true for the clustered index. This can really matter if log generation is important to your environment, e.g. using an AG.

Click through for the proof. We often think of included columns on indexes as being nearly free performance gains, but “nearly” is carrying a bit of water here. This shouldn’t dissuade you from using included columns on indexes, but is a valid reason not to include all 50 non-index columns in a table “just in case.”

Comments closed