Press "Enter" to skip to content

Category: Internals

Proportional Fill Algorithm

Paul Randal discusses the proportional fill algorithm that SQL Server uses for extent allocation:

Proportional fill works by assigning a number to each file in the filegroup, called a ‘skip target’. You can think of this as an inverse weighting, where the higher the value is above 1, the more times that file will be skipped when going round the round robin loop. During the round robin, the skip target for a file is examined, and if it’s equal to 1, an allocation takes place. If the skip target is higher than 1, it’s decremented by 1 (to a minimum value of 1), no allocation takes place, and consideration moves to the next file in the filegroup.

(Note that there’s a further twist to this: when the -E startup parameter is used, each file with a skip target of 1 will be used for 64 consecutive extent allocations before the round robin loop progresses. This is documented in Books Online here and is useful for increasing the contiguity of index leaf levels for very large scans – think data warehouses.)

Read on for some implementation details as well as a good scenario for why it’s important to know about this.

Comments closed

Context Switching

Ewald Cress has finally snapped:

You start with a blank sheet,
three nuts and a bolt,
a strong sense of fairness,
a large can of Jolt.

And you try to imagine,
as best as you’re able
a rulerless kingdom
that won’t grow unstable.

That’s what happens when you dig into internals for too long.

Comments closed

LOB On Memory-Optimized Tables

Dmitri Korotkevitch digs into LOB data when you build a memory-optimized table:

There is also considerable overhead in terms of memory usage. Every non-empty off-row value adds 50+ bytes of the overhead regardless of its size. Those 50+ bytes consist of three artificial ID values (in-row, off-row in data row and leaf-level of the range index) and off-row data row structure. It is even larger in case of LOB columns where data is stored in LOB Page Allocator.

One of the key points to remember that decision which columns go off-row is made based on the table schema. This is very different from on-disk tables, where such decision is made on per-row basis and depends on the data row size. With on-disk tables, data is stored in row when it fits on the data page.

In-Memory OLTP works in the different way. (Max) columns are always stored off-row. For other columns, if the data row size in the table definition can exceed 8,060 bytes, SQL Server pushes largest variable-length column(s) off-row. Again, it does not depend on amount of the data you store there.

This is a great article getting into the internals of how memory-optimized tables work in SQL Server 2016, as well as a solid reason to avoid LOB types and and very large VARCHAR values on memory-optimized tables if you can.  Absolutely worth a read.

Comments closed

The SQLOS Scheduler

Ewald Cress builds an extended metaphor for the SQLOS scheduler:

When the time comes for a bunny to pass the battery, it may be out of free choice, or it might be because its script went down a path where passing it along is The Thing To Do. At this juncture, the team’s collective memory and playbook comes to the fore, and agreed rules dictate who the battery goes to. It doesn’t really matter what those rules are for the moment. The important point is that control is transferred by the players themselves using shared rules and a team whiteboard tracking who is ready to go, which team member might be most deserving, who has been waiting the longest etc. This code of conduct and state, this bushido of bunny bonhomie, is what we call a scheduler.

This is building up to something big…

Comments closed

Counting Without Counts

Ewald Cress discusses the breferences member:

I’ll spare you my false starts, but I think I finally have it. The first observation is that, on the occasions breferences increments, it does not increment linearly, but instead has an exponential growth pattern. These increments take it through the sequence 0, 1, 3, 7, 15, 31, 127, 255 etc. Or in binary: 0, 1, 11, 111, 1111, 11111, 111111, 1111111, 11111111…

Those numbers can be seen as off-by-one variations of powers of two. Forget the offset, and think of the number as simply doubling on each increment if it keeps your head clearer – instead of accuracy, we have a order-of-magnitude reference count.

I’d never heard of an algorithm like this, although that could be due to my having dealt with relatively little low-level structural code.  I’m glad Ewald sussed out the mechanics driving breferences.

Comments closed

Superlatches

Ewald Cress discusses superlatch promotion:

There are quite a few pieces of machinery that are involved in our little drama. First, I’ll introduce some instance-global settings:

  • A flag that controls whether latch promotion is enabled at all. Although I don’t have any information about this, let’s assume that it will be enabled on any system that “warrants it”.
  • A flag that controls whether cycle-based promotion is enabled. Again, I can’t currently tell you what determines this setting.
  • sm_promotionThreshold, the current calculated cycle-based promotion threshold described in Part 3.
  • sm_promotionUpperBoundCpuTicks, used as a ceiling value to prevent outliers from skewing stats. As described in Part 3, this is simply sm_promotionThreshold * 5.
  • Trace flag 844, which lowers the threshold for non-cycle-based promotions.
  • Trace flag 827, which causes each latch promotion to be noted in the SQL Server log (“Latch promotion, page %u:%u in database %u, objid %u.”)

Assume that the first flag is set on our system of interest, otherwise promotions won’t happen and we have nothing to talk about.

Read the whole thing.

Comments closed

Latch Promotion

Ewald Cress discusses latch promotion threshold calculations:

Now I wish I could use the phrase “cycle-based promotion threshold” in a tone that suggests we were all born knowing the context, but to be honest, I don’t yet have all the pieces. Here is the low-down as it stands in SQL Server 2014:

  • Everything I’m describing applies only to page latches.

  • A cycle-based promotion simply means one that is triggered by the observation that the average acquire time for a given page latch (i.e. the latch for a given page) has exceeded a threshold.

  • Because the times involved are so short, they are measured not in time units but in CPU ticks.

  • There exists a global flag that can enable cycle-based promotions, although I do not know what controls that flag.

  • If cycle-based promotion is disabled, there is another path to promotion; this will be be discussed in Part 4.

I don’t think I’d ever seen the informational message Ewald mentions, so this was a brand new topic to me.

Comments closed

Filtering Data

Slava Murygin shows various ways to filter data, in particular data from fn_dblog():

Would say we need to extract an information associated with an “UPDATE” for LSNs started at “0000004f:00000087:0001”. You can just specify Starting and Ending LSNs as “fn_dblog” parameters:

[…]

That portion of code would return you ONLY Log records between LSNs “0000004f:00000087:0001″ and “0000004f:00000088:0001″.

Slava’s post uses fn_dblog() as an example but the techniques are applicable across the board, and in practice sum up to “get the fewest number of rows and fewest number of columns you need to solve the problem at hand.”

Comments closed

Identifying Object Names Using fn_dblog()

Frank Gill digs into the transaction log to find object names used in CREATE and ALTER statements:

This is the object id of the view that was created.  So, Jes’s question was answered.  But this led me to one of my other favorite SQL Server topics: string manipulation.  The following script will identify all transactions for a particular Transaction Name and return the object name affected.  The comments provide additional information about the functionality.

Click through to check out Frank’s script.

Comments closed

Checkpoint Behavior Changes

Mike Ruthruff discusses changes in checkpoint behavior in SQL Server 2016:

The following are the primary changes which will impact behavior of checkpoint in SQL Server 2016.

  1. Indirect checkpoint is the default behavior for new databases created in SQL Server 2016. Databases which were upgraded in place or restored from a previous version of SQL Server will use the previous automatic checkpoint behavior unless explicitly altered to use indirect checkpoint.

  2. When performing a checkpoint SQL Server considers the response time of the I/O’s and adjusts the amount of outstanding I/O in response to response times exceeding a certain threshold. In versions prior to SQL Server 2016 this threshold was 20ms. In SQL Server 2016 the threshold is now 50ms. This means that SQL Server 2016 will wait longer before backing off the amount of outstanding I/O it is issuing.

  3. The SQL Server engine will consolidate modified pages into a single physical transfer if the data pages are contiguous at the physical level. In prior versions, the max size for a transfer was 256KB. Starting with SQL Server 2016 the max size of a physical transfer has been increased to 1MB potentially making the physical transfers more efficient. Keep in mind these are based on continuity of the pages and hence workload dependent.

Definitely read the whole thing.

Comments closed