Press "Enter" to skip to content

Category: Performance Tuning

When repartition() Beats coalesce() in Spark

Janani Annur Thiruvengadam stands some common advice on its head:

If you’ve worked with Apache Spark, you’ve probably heard the conventional wisdom: “Use coalesce() instead of repartition() when reducing partitions — it’s faster because it avoids a shuffle.” This advice appears in documentation, blog posts, and is repeated across Stack Overflow threads. But what if I told you this isn’t always true?

In a recent production workload, I discovered that using repartition() instead of coalesce() resulted in a 33% performance improvement (16 minutes vs. 23 minutes) when writing data to fewer partitions. This counterintuitive result reveals an important lesson about Spark’s Catalyst optimizer that every Spark developer should understand.

Read on for the details on that scenario.

Leave a Comment

Tips for Running SQL Server on Hyper-V

Mike Walsh shares some advice:

If you’ve ever asked yourself, “Why does my SQL Server seem slower on Hyper-V than it should be?!”, this post might help. And if you asked me about Hyper-V ten years ago, I’d probably have laughed. Maybe even less than that. But here’s the thing: it scales, it works, and with the Broadcom/VMware “fun” squeezing the world for profit, we’re seeing more and more folks making the move to Hyper-V. You have to pay attention to some key configurations, just as with VMware..

None of this is radical advice, but it is good to make sure that you have the basics covered because these can make a difference.

Leave a Comment

Generating a DAXX File for Performance Tuning

Phil Seamark does some troubleshooting:

When troubleshooting slow DAX queries, sharing the right diagnostic information with an expert can make all the difference. That’s where a DAXX file comes in. This special file format is created using DAX Studio. It bundles essential metadata and performance details without exposing query results. It’s perfect for collaborative optimisation.

Read on to learn more about what a DAXX file is and how it can be useful in the performance tuning process.

Leave a Comment

Automating Power BI Load Testing via Fabric Notebook

Gilbert Quevauvilliers grabs a query:

Load testing is essential when working with Microsoft Fabric capacity. With limited resources, deploying a Power BI report without testing can lead to performance issues, downtime, and frustrated users. In this series, I’ll show you how to automate load testing using Fabric Notebooks, making the process faster, easier, and repeatable.

Inspired by Phil Seamark’s approach, this method eliminates manual complexity and allows you to capture real user queries for accurate testing.

Read on for the first part, in which Gilbert uses the Performance Analyzer to capture query details.

Leave a Comment

Using Community Tools to Troubleshoot SQL Server Performance

Kevin Hill wraps up a series on what to do when “SQL Server is slow”:

This post will lean heavily on the First Responder Kit from Brent Ozarsp_WhoIsActive from Adam Machanic and others. They took what Microsoft provides and made them better. Hundreds or thousands of hours of work to make FREE things for you to use.

This is the most complex blog I’ve ever written. Your experiences may differ, and my code samples might have some bugs. Test first.

Kevin is trying to sum up a lot of material in a blog post, but this serves as a pretty good starting point for people who don’t know where to begin. One of the key takeaways from the post, and the entire series, is to have a plan in place. Regardless of whether you use what Kevin has as-is or make your own modifications, knowing what to do and how to do it will save a lot of time and energy when you need to act.

Comments closed

Narrowing down Slowdown Causes in SQL Server

Kevin Hill continues a series on solving the age-old “The server is slow!” problem:

At this point you’ve:

  • Defined what “slow” means and built a timeline (Part 1).
  • Checked things outside SQL Server like network, storage, and VM noise (Part 2).

Now it’s time to open the hood on SQL Server itself.

I think Kevin’s checklist is a pretty solid one for the type of client he often deals with: one without an in-house DBA or the expertise to stay on top of server problems.

Comments closed

Splitting GUIDs into Multiple BIGINTs for Columnstore

Forrest McDaniel performs an experiment:

You may have run into issues with GUIDs as clustering keys, but another major problem is in columnstore. Smart people at Microsoft wrote columnstore in a way to take advantage of modern CPU features, but those CPU features don’t play well with datatypes larger than 8 bytes. Which includes GUIDs.

Read on for the demonstration of this, a clever workaround, and the ramifications of splitting GUIDs into two BIGINTs. Full points for cleverness, though like Forrest, I wouldn’t want to use this in production.

Comments closed

Scaling On-Prem Vector Search with Ollama and Nginx

Anthony Nocentino solves a problem:

When you call out to an external embedding service from T-SQL via REST over HTTPS, you’re limited by the throughput of that backend. If you’re running a single Ollama instance, you’ll quickly hit a ceiling on how fast you can generate embeddings, especially for large datasets. I recently attended an event and discussed this topic. My first attempt at generating embeddings was for a three-million-row table. I had access to some world-class hardware to generate the embeddings. When I arrived at the lab and initiated the embedding generation process for this dataset, I quickly realized it would take approximately 9 days to complete. Upon closer examination, I found that I was not utilizing the GPUs to their full potential; in fact, I was only using about 15% of one GPU’s capacity. So I started to cook up this concept in my head, and here we are, load balancing embedding generation across multiple instances of ollama to more fully utilize the resources.

Click through for the solution.

Comments closed

Memory Grant Feedback Woes

Rebecca Lewis explains an issue with memory grant feedback in SQL Server:

Before SQL Server runs a query, it estimates how much memory it needs for sorting and joining. But what if it gets it wrong?

  • Too little memory → Spills to tempdb (slow)
  • Too much memory → Starves other queries

SQL Server 2017+ tries to fix bad estimates based on previous calls with ‘Memory Grant Feedback’.  Kinda like:  ‘Last time I gave you 2GB but you only used 50MB. Next time I’m giving you less.’

This is great in theory, though Rebecca shows a case in which the end result might not be great.

Comments closed