Press "Enter" to skip to content

Category: Performance Tuning

Troubleshooting High I/O Usage on Azure SQL DB

Etienne Lopes troubleshoots a strange issue:

After the downsizing (to GeneralPurpose: Standard-series (Gen5), 2 vCores) occasionally there were timeouts in the application for a very specific task (the command timeout property in the application was set to 30 seconds). Other times the very same task would execute immediately, as it should always, since the underlying query was actually quite simple: a SELECT to a single, although large table (58 GB) but with a predicate that would always result in a perfect index seek to return never more than 300 rows. Furthermore each time there were timeouts, there were also momentary I/O spikes up to 100%:

Read on to learn more about what caused this problem and how Etienne was able to resolve it.

Comments closed

Executing Multiple Notebooks in one Spark Pool with Genie

Shalu Ganotra Chadra, et al, explain what Synapse Genie is:

The Genie framework is a metadata driven utility written in Python. It is implemented using threading (ThreadPoolExecutor module) and directed acyclic graph (Networkx library). It consists of a wrapper notebook, that reads metadata of notebooks and executes them within a single Spark session. Each notebook is invoked on a thread with MSSparkutils.run() command based on the available resources in the Spark pool. The dependencies between notebooks are understood and tracked through a directed acyclic graph.

Read on for more information about how you can use it and what the setup process looks like.

Comments closed

Troubleshooting High CPU via PAGELATCH Waits

Ajay Dwiveldi does some digging:

In the above dashboards, I could clearly notice PAGELATCH_** wait at the top along with SOS_SCHEDULER_YIELD. The presence of the above 2 waits is indicative of high CPU issues due to contention on the access of data file pages. I validated and found that this PAGELATCH_** wait is present almost all the time on the server. So decided to check the data of dbo.WhoIsActive that stores captured data of sp_WhoIsActive in SQLMonitor tool.

Read on for the outcome.

Comments closed

Apache Spark Performance Tuning Tips

Amit Kumar shares a few tips with us:

RDD does serialisation and de-serialisation of data whenever it distributes the data across clusters such as during repartition and shuffle, and we all know that serialisation and de-serialisation are very expensive operations in spark.
On the other hand, DataFrame stores the data as binary using off-heap storage, no need for deserialization and serialization of data when it distributes to clusters. We see a big performance improvement in DataFrame over RDD

Click through for several additional tips.

Comments closed

Breaking the World with auto_explain

Ryan Lambert gets a lot of explanation:

Postgres has a handy module called auto_explain. The auto_explain module lives up to its name: it runs EXPLAIN automatically for you. The intent for this module is to automatically provide information useful for troubleshooting about your slow queries as they happen. This post outlines a pitfall I recently discovered with auto_explain. Luckily for us, it’s an easy thing to avoid.

I discovered this by running CREATE EXTENSION postgis; and watching it run for quite a while before failing with an out of disk space error. That is not my typical experience with a simple CREATE EXTENSION command!

Read on to learn what happened and how you can prevent making a similar mistake.

Comments closed

Azure SQL Database Performance Roundup

Reitse Eskens shares the goods:

In the past 9 blogs, I’ve shown you all sorts of Azure SQL database solutions and gave them a little run for their money. I’ve tested a lot and written about them. This blog will be about the summation of the data and my views on the combined graphs. At the end I’ll wrap it up with my way of working when a new project starts.

But before I kick off, a little Christmas present. What I didn’t do, until now, is give you access to more raw data. Now is the moment to give you more raw number to play around with for yourself and do your own analysis. Fun as it might be, I’d highly encourage you to use my sheets as a jumping point and adapt them for your own workloads. You can find the two Excel files via the link for the scripts.

This is a post I’d been waiting for, as it covers the comparisons between tiers directly, rather than inferring it from the various posts.

Comments closed

Azure SQL Managed Instance Performance

Reitse Eskens wraps up a series on Azure SQL performance comparisons:

So far, the blogs were about the really SaaS databases; the database is deployed and you don’t have think about it anymore. This ease of use comes at a ‘price’. You’ve got no control whatsoever on files, you’ve lost the SQL Agent and a number of other features. The managed instance is a bit different. When I was testing you could see the TempDB files but not change them, since then a few changes have been made to this tier where you’re able to change settings and, Niko Neugebauer told the data community on twitter, there are more changes coming. With the managed instance, you get the agent back and you can run cross database query’s again. So you can safely say the managed instance is a hybrid between your trusty on-premises server and the fully managed Azure SQL database.

Click through for Reitse’s thoughts.

Comments closed

Performance-Killing Pre-Emptive Waits

Sean Gallardy finds the real killer:

If you haven’t already read up on cooperative and preemptive scheduling or aren’t sure what those are, please read the intro to that first, otherwise you’ll be lost.

Much as I’ve discussed before, SQL Server uses a cooperative scheduling model. This doesn’t mean that Windows does, nor does it mean Windows will scheduler whatever SQL Server schedules, in fact much of the time there are many other threads that run before the ones from SQL Server, that’s the job of the operating system to figure out. Due to SQL Server using cooperative scheduling there needs to be a mechanism that exists such that when a resource not under SQL Server’s control needs interaction the scheduler can keep on scheduling and threads can switch in and out (in SQL Server, Windows does what Windows wants). Enter preemptive status and associated waits.

Click through for a deep dive on the topic.

Comments closed

RCSI and Blocking

Michael J. Swart says don’t worry, be happy:

What’s the best way to avoid most blocking issues in SQL Server? Turn on Read Committed Snapshot Isolation (RCSI). That’s it.

Do check out Erik Darling’s comment as well for one thing to keep in mind if you turn on RCSI.

The other thing to keep in mind is that, if you have WITH(NOLOCK) hanging around everywhere in your code, you won’t get as much of a benefit with RCSI until you remove them.

Comments closed

New Query Tuning Book

Grant Fritchey has a book for us:

If you’re interested in getting a digital copy, my brand spanking new book is now available here.

It’s in the intro, but let me tell you a little bit about the new book. It’s really new. Some of the older versions of the book were simply updated, a bunch of changes to most chapters, a couple of new chapters, fixes for old mistakes, ta-da, new book. Not this time. This time, I rewrote it all. From scratch.

Looks like I’ll need to get a copy.

Comments closed