Press "Enter" to skip to content

Category: Performance Tuning

Tips for Scaling Cassandra Clusters

Mario Tavares wants more zoom:

When the use case aligns with the architectural limitations, Cassandra excels at storing and accessing datasets up to petabytes in volume, delivering impressive throughput. As the data or workload volume grows, we expand the cluster linearly, ensuring consistent performance.

However, even when we adhere to the documentation and best practices and create an effective data model, we might encounter underperforming nodes or unexpected challenges with throughput scaling after a cluster expansion—and it’s not always clear what causes the imbalance. Linear scalability relies on the assumption that workload and data are evenly distributed across all nodes in a cluster, and the cluster capacity relates directly to the number of nodes. Sometimes, these conditions aren’t met, affecting linear scalability. So, we strive for scalability and balance and are willing to fulfill the necessary conditions.

Read on for a few common performance issues and what you can do about them.

Leave a Comment

Tips for Using a Data Lakehouse

James Serra shares some advice:

As I mentioned in my Data Mesh, Data Fabric, Data Lakehouse presentation, the data lakehouse architecture, where you use a data lake with delta lake as a software layer and skip using a relational data warehouse, is becoming more and more popular. For some customers, I will recommend “Use a data lake until you can’t”. What I mean by this is to take the following steps when building a new data architecture in Azure with Azure Synapse Analytics:

Click through for six notes.

Leave a Comment

Text Search Performance Optimization in Power BI

Chris Webb provides advice about a relatively new feature:

In the blog post for the October 2022 release of Power BI Desktop there’s a brief description of a new optimisation in Power BI for filters that perform searches in text columns. In this blog post I’m going to share more details about this optimisation and how you can make sure your reports can benefit from it.

Read on to learn a bit more about how it works, as well as a few tips on ensuring that you’re able to take full advantage of this optimization.

Leave a Comment

Diagnosing a Resource Semaphore Wait Issue

Jose Manuel Jurado Diaz finds excessive resource semaphore waits:

Today, we got a service request that our customer reported that they query are taking too much for their execution. The main wait stats found was RESOURCE_SEMAPHORE and I would like to share with you my lessons learned here. 

We executed this query to find out the queries and check the resource semaphore wait type. 

Click through for the queries and diagnosis.

Comments closed

Matrix Multiplication in R with DuckDB and SQLite

Karsten Weinert compares two databases:

On my laptop with 16 GB RAM, I would like to perform a matrix-vector multiplication with a sparse matrix of around 10 million columns and 2500 rows. The matrix has approximately only 2% non-zero entries, but this are still 500 million numbers and the column/row information, too large to work comfortably in-memory.

A while ago, I tried using sqlite for this task. It kind of worked, but was too slow to be useful. This weekend, I revisited the problem and tried using duckdb.

Read on for the results. I’ve heard enough positives about DuckDB over the past few weeks that it makes me want to try it out. H/T R-Bloggers.

Comments closed

Degree of Parallelism Feedback

Deborah Melkin looks at a nice feature in SQL Server 2022:

A couple of months ago, I wrote about my initial thoughts to SQL Server 2022. I think a lot of what I wrote still holds true. It feels like it’s building on the functionality from previous versions, especially when it comes to Intelligent Query Processing and its feedback capabilities.

The feature I’m most curious to find out more about is the Degree of Parallelism Feedback. (You can read more about it here.) One of the main reasons for this is I know very little about the settings.

Read on for Deb’s thoughts and a situation where it would have helped.

Comments closed

TempDB Improvements in SQL Server Versions

Jared Poche looks at the recent past:

Tempdb contention has long been an issue in SQL Server, and there are many blogs on the issue already. But I wanted to add one more mainly to highlight the improvements in recent versions of SQL Server

Tempdb contention is most often discussed in as relating to the creation of temp tables (and other objects) in tempdb. If you are experiencing this you will see PAGELATCH_EX or PAGELATCH_SH waits, frequently with wait resources like 2:1:1 or 2:1:3. This indicates contention in database 2 (tempdb), page 1 (the first data file in tempdb), and one of the PFS, GAM, or SGAM pages (which are pages 1, 2, and 3 respectively). Tempdb files of sufficient size will have additional PFS, GAM, and SGAM pages at higher page numbers, but 1 and 3 are the pages most often referenced.

Read on to see what the SQL Server team has done over three of the past four versions of SQL Server to limit tempdb contention.

Comments closed

Finding a Scalar Function Caller

Matthew McGiffen searches for the root of the problem:

In this post we look at a method using Extended Events (XE) to identify what parent objects are calling a given SQL function and how often.

The background is that I was working with a team where we identified that a certain scalar function was being executed billions of time a day and – although lightweight for a single execution – overall it was consuming significant CPU on the server. We discussed a way of improving things but it required changing the code that called it. The problem was that the function was used in about 700 different places across the database code – both in stored procedures and views – though the views themselves would then be referenced by other stored procedures. Rather than update all the code they’d like to target the objects first that execute the function the most times.

Read on to see how Matthew did it, as well as some caveats along the way.

Comments closed

A Story of Implicit Conversions

Tracy Boggiano tells a tale:

Recently I had a situation where I was looking at a SQL Instance due to a contractor who controls the system wanting to drop some indexes and they couldn’t because they couldn’t get a lock request.  Sadly, they were using the GUI to try to accomplish this.  Always makes me sad when we aren’t using scripts.  So, I took a deep drive into what was actually running on this system.  I had them send over scripts for what they were trying to do.  Meanwhile, I looked at the system and discovered some things.  First, we had really had RESOURCE_SEMAPHORE wait types for two days which indicates it waiting on memory.  So, I took a look at the memory clerks.  The top memory clerk was for lock manager at 32 GBs and the system only has 64 GBs, which is abnormal. 

Read on for the full story. This particular fix worked, I presume, because the ORM was sending all parameters as Unicode, whereas none of the strings in the database were Unicode—they were all VARCHAR rather than NVARCHAR.

Comments closed