Press "Enter" to skip to content

Category: Performance Tuning

Cache Management and Semantic Link in Fabric Notebooks

Marc Lelijveld warms up the cache:

In the previous blog, I wrote about data temperature as part of Fabric when you’re using Direct Lake storage mode. In that blog, I explained how you can get insights in the temperature of a column, what that temperature means and what the impact of the temperature is on columns that are queried more often.

In this blog, I will continue this story by elaborating on a process called framing and how you can influence data eviction to drop data from memory. Finally, this blog goes into more details on how you could use Semantic Link in Fabric Notebooks to warm up the data for most optimal end-user performance.

The SQL Server analog here is having some automated queries which keep specific pages in the buffer pool, like a warm-up script for an instance with plenty of memory but slow disks.

Comments closed

The Risk of Changing MaxDOP

Erik Darling recommends caution:

Like in yesterday’s post about Cost Threshold For Parallelism, changing MAXDOP settings will have a universal effect on the workload.

This is true whether you change it at the server level for all databases, or at the database level using a database scoped configuration for a single database.

It is a guardrail to prevent unwanted conditions as a whole, like excessive concurrent parallel queries causing worker thread starvation (THREADPOOL waits), or just pushing CPU to 100% for extended periods of time.

Read on to see what Erik recommends you think about after any MaxDOP change.

Comments closed

Thoughts on Cost Threshold for Parallelism

Erik Darling has some thoughts:

First, I’m not suggesting that anyone should be using the default value for Cost Threshold For Parallelism. It’s old and moldy and not a good fit for most workloads functioning on modern hardware.

My apologies to Azure SQLDB users who can’t change this setting and leave it up to Microsoft to maybe manage it for them based on ???

Some people out there really like fiddling with settings in a usually ill-informed reaction to Some Script They Found On The Internet, without reading the fine print.

Erik’s thoughts are reasonable overall. My recommendation is to use Michael J. Swart’s technique for tuning cost threshold for parallelism as a starting point, as it gives you a basis for what the net effect of your changes are.

Comments closed

Heap-Only Tuples in Postgres

Umair Shahid explains the benefit of Heap-Only Tuples in PostgreSQL:

Heap-only tuples, also known as HOT, are PostgreSQL’s answer to the update query performance issues caused by MVCC. These tuples allow PostgreSQL to mark a row as “dead” and physically reuse the space it occupies in the table. This process eliminates the need to keep multiple versions of the same row, reducing I/O and improving query performance.

Read on to see how these compare to the normal MVCC process in Postgres, as well as cases when you should and should not use them.

Comments closed

Row-Level Security Performance and Troubleshooting

Ben Johnston digs into row-level security:

There are two main areas where RLS can impact performance. The first is the user or authentication lookup. Some kind of lookup must be performed in the access predicate to determine either the user name, group membership, or specific values in the session context. Considering that RLS is non-prescriptive, the lookup isn’t confined to these methods, but they are very easy methods to use and implement and are standard based on implementations I’ve seen.

The second area is the authorization lookup. The authorization lookup, checking if a user has access to particular rows, can have a much bigger impact on performance. This is also in the access predicate. Following the basic rules for performance and keeping lookups simple goes a long way to minimizing the impact of RLS on performance. The goal is to keep performance levels as close as possible to a table without RLS. If indexes and predicates are correct, RLS can improve performance in some situations due to the automatic filtering that happens.

Read on for Ben’s thoughts on the topic.

Comments closed

Reviewing Postgres SQL Statement History

Muhammad Ali does some performance testing:

pg_stat_statements is a built-in PostgreSQL extension that keeps track of SQL statements executed by the database server. It records details about query execution counts, total execution time and I/O related info. By capturing this information, database administrators can identify bottlenecks and optimize queries for better overall performance. In this blog, we’ll take a  look at how pg_stat_statements module can be utilized to enhance the performance of your PostgreSQL database.  

Read on to see how you can enable this extension, configure it, and use it to find the worst performers in the bunch.

Comments closed

Manual Stats Updates in SQL Server

Matthew McGiffen can’t wait for SQL Server to update those stats:

Having up to date statistics is vital for getting the best performance out of your queries. Even though SQL Server automatically updates statistics in the background for you (When do statistics get updated?), you may find there are times when you want to manage updating them yourself.

Click through to learn several techniques for stats updates.

Comments closed

Value and Hash Encoding in VertiPaq

Kristyna Hughes looks at column encodings:

Power BI encoding is a powerful optimizing option that is often overlooked because it’s not visible in neither the Power BI Desktop tool nor in Power BI Service. Natively, the VertiPaq engine in Power BI investigates all columns in the data model and determines how it can store that data most efficiently. To achieve maximum compression, the VertiPaq engine starts by encoding each column which determines the method of compression applied to that column. There are a couple types of encoding – value and hash.

Read on to learn the difference, as well as how to push your columns to use a specific type of encoding.

Comments closed

CPU Threads in SQL Server Backups

Andy Yun dives in:

Welcome back to Part 3 of my SQL Server Backup Internals Series.

In Part 1, I introduced the “parts” of a BACKUP Operation and in Part 2, we delved into Backup Buffers. Today, we’re going to talk about what manipulates those Backup Buffers = CPU Threads. This’ll be a longer blog, so go refill your coffee now.

Andy did an outstanding job explaining what reader and writer threads do and how SQL Server picks the numbers of each.

Comments closed