Press "Enter" to skip to content

Month: September 2016

Sort Spill To Level 15K

Paul White explains spill levels:

At this point, you might be wondering what combination of tiny memory grant and enormous data size could possibly result in a level 15,000 sort spill. Trying to sort the entire Internet in 1MB of memory? Possibly, but that is way too hard to demo. To be honest, I have no idea if such a genuinely high spill level is even possible in SQL Server. The goal here (a cheat, for sure) is to get SQL Server to report a level 15,000 spill.

The key ingredient is partitioning. Since SQL Server 2012, we have been allowed a (convenient) maximum of 15,000 partitions per object (support for 15,000 partitions is also available on 2008 SP2 and 2008 R2 SP1, but you have to enable it manually per database, and be aware of all the caveats).

Read the whole thing.

Comments closed

Context Switching

Ewald Cress has finally snapped:

You start with a blank sheet,
three nuts and a bolt,
a strong sense of fairness,
a large can of Jolt.

And you try to imagine,
as best as you’re able
a rulerless kingdom
that won’t grow unstable.

That’s what happens when you dig into internals for too long.

Comments closed

SSMS Query Shortcuts

Andrew Pruski shows where to set Management Studio query shortcuts:

Following on from my last post Changing connection colours in SSMS I thought I’d write another quick about this cool but also often unused feature in SSMS.

These shortcuts allow you to run pre-determined queries by assigning a hot key within SSMS. To do this in SSMS go to Tools > Options > Environment > Keyboard

I love query shortcuts.  I have three dedicated to different sp_whoisactive commands (one to get everything going on; one to get everything related to my username; and one to get everything going on plus query plans, which I don’t always use because of the additional overhead).

Comments closed

Understanding Query Durations

Kendra Little explains some of the intricacies behind query durations:

I typically look at the ‘CPU time’ metric when tuning instead of ‘elapsed time’ (duration). This can work well for tuning because you’re measuring how much more efficient you made the  query in terms of CPU cycles.

But ‘CPU time’ isn’t perfect, and it can get a little weird for reporting results to users, because:

  • If the query uses parallelism, CPU time can be higher than the duration — which may make the query seem “slower” than it actually is to anyone reading a report

  • ‘elapsed time’ includes all the time that it takes to display the results in Management Studio, which is probably a different duration than it would take to return the results to an application server. If you’re just returning a few rows, this may be negligible– but once it gets into the thousands of rows, it can be very noticeable.

Moral of the story:  also use SQL Sentry Plan Explorer…

Comments closed

Stream Graphs

Devin Knight continues his visualization series with the Stream Graph:

Key Takeaways

  • Works and looks similar to a Stacked Area Chart but with a wiggle feature that gives it a more fluid look and feel

  • Great for displaying data that changes over time

At first, I read this as “Steam Graph,” which made it sound like a steampunk visualization with unnecessary pipes and mechanical accouterments, but alas, it was not meant to be.  I do like the stream graph visual, though.

Comments closed

LOB On Memory-Optimized Tables

Dmitri Korotkevitch digs into LOB data when you build a memory-optimized table:

There is also considerable overhead in terms of memory usage. Every non-empty off-row value adds 50+ bytes of the overhead regardless of its size. Those 50+ bytes consist of three artificial ID values (in-row, off-row in data row and leaf-level of the range index) and off-row data row structure. It is even larger in case of LOB columns where data is stored in LOB Page Allocator.

One of the key points to remember that decision which columns go off-row is made based on the table schema. This is very different from on-disk tables, where such decision is made on per-row basis and depends on the data row size. With on-disk tables, data is stored in row when it fits on the data page.

In-Memory OLTP works in the different way. (Max) columns are always stored off-row. For other columns, if the data row size in the table definition can exceed 8,060 bytes, SQL Server pushes largest variable-length column(s) off-row. Again, it does not depend on amount of the data you store there.

This is a great article getting into the internals of how memory-optimized tables work in SQL Server 2016, as well as a solid reason to avoid LOB types and and very large VARCHAR values on memory-optimized tables if you can.  Absolutely worth a read.

Comments closed

Refactoring In SSDT

Ed Elliott has an introductory-level article on refactoring code within SQL Server Data Tools:

SSDT helps us to refactor code by automating the actions of:

  • Expanding wildcards
  • Fully qualifying object names
  • Moving objects to a different schema
  • Renaming objects

Aside from this list SSDT also, of course, helps us to refactor code manually with its general editing facilities.

If you aren’t familiar with what SSDT can do, check out this article.

Comments closed

Apache Kudu

Greg Rahn discusses Apache Kudu:

At a high level, Kudu is a new storage manager that enables durable single-record inserts, updates, and deletes, as well as fast and efficient columnar scans due to its in-memory row format and on-disk columnar format.  This architecture makes Kudu very attractive for data that arrives as a single record at a time or that may need to be modified at a later time.

Today, many users try to solve this challenge via a Lambda architecture, which presents inherent challenges by requiring different code bases and storage for the necessary batch and real-time components. Using Kudu and Impala together completely avoids this problematic complexity by easily and immediately making data inserted into Kudu available for querying and analytics via Impala. (For more technical details on how Impala and Kudu work together for analytical workloads, see this post.)

I’d jokingly say “Someday, somebody’s going to reinvent the relational database inside of Hadoop.”  But it seems like that’s less of a joke than a medium-term prediction.

Comments closed

TDE With Database Mirroring

I have a post on setting up database mirroring when the underlying database uses Transparent Data Encryption:

 Now it’s time to take some backups. First, let’s back up the various keys and certificates:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
USE [master]
GO
--Back up the service master key
--Note that the password here is the FILE password and not the KEY password!
BACKUP SERVICE MASTER KEY TO FILE = 'C:\Temp\ServiceMasterKey.key' ENCRYPTION BY PASSWORD = 'Service Master Key Password';
GO
--Back up the database master key
--Again, the password here is the FILE password and not the KEY password.
BACKUP MASTER KEY TO FILE = 'C:\Temp\DatabaseMasterKey.key' ENCRYPTION BY PASSWORD = 'Database Master Key Password';
GO
--Back up the TDE certificate we created.
--We could create a private key with password here as well.
BACKUP CERTIFICATE [TDECertificate] TO FILE = 'C:\Temp\TDECertificate.cert'
    WITH PRIVATE KEY (FILE = 'C:\Temp\TDECertificatePrivateKey.key', ENCRYPTION BY PASSWORD = 'Some Private Key Password');
GO

Click through for the details.

Comments closed

SSAS And Power BI Performance Issue

Chris Webb describes an issue with SSAS Multidimensional and Power BI-generated DAX causing a performance problem:

This query has something in it – I don’t know what – that means that it cannot make use of the Analysis Services Storage Engine cache. Every time you run it SSAS will go to disk, read the data that it needs and then aggregate it, which means you’ll get cold-cache performance all the time. On a big cube this can be a big problem. This is very similar to problems I’ve seen with MDX queries on Multidimensional and which I blogged about here; it’s the first time I’ve seen this happen with a DAX query though. I suspect a lot of people using Power BI on SSAS Multidimensional will have this problem without realising it.

This problem does not occur for all tables – as far as I can see it only happens with tables that have a large number of rows and two or more hierarchies in. The easy way to check whether you have this problem is to refresh your report, run a Profiler trace that includes the Progress Report Begin/End and Query Subcube Verbose events (and any others you find useful) and then refresh the report again by pressing the Refresh button in Power BI Desktop without changing it at all. In your trace, if you see any of the Progress Report events appear when that second refresh happens, as well as Query Subcube Verbose events with an Event Subclass of Non-cache data, then you know that the Storage Engine cache is not being used.

This doesn’t look to be a quick fix, so do read the whole thing to help figure out how to avoid this issue.

Comments closed