Press "Enter" to skip to content

Curated SQL Posts

Read-Scale Availability Groups

Ryan Adams explains how to create a Read-Scale Availability Group:

A Read-Scale Availability Group is a Clusterless Availability Group.  It’s sole purpose and design is to scale out a read workload.  More importantly is what it is not.  It is NOT a High Availability or Disaster Recovery solution.  Since this design has no cluster under it, you lose things like automatic failover and database level health detection.  For example, You have reports that run for customers that are in your DMZ that is fire-walled off from your internal network.  Opening up ports for Active Directory so that you can have a cluster means opening a ton of ephemeral ports and ports with high attack vectors.  Remember the Slammer worm?  This solution removes those dependencies.

Click through for the setup scripts as well as a video Ryan created of him putting it all together.  As long as you recognize the trade-offs involved, this can be a nice solution to certain problems.

Comments closed

Resumable Online Index Creation In Azure SQL Database

Niko Neugebauer looks at a feature coming in SQL Server vNext:

It is about the time to create our first Clustered Online Resumable Index:

CREATE CLUSTERED INDEX CI_SampleDataTable	ON dbo.SampleDataTable (c1)	WITH ( ONLINE = ON, RESUMABLE = ON ) ;

But all we shall get is an error message:

Msg 155, Level 15, State 1, Line 25
'RESUMABLE' is not a recognized CREATE CLUSTERED INDEX option.

I was shocked and I was disappointed, but I have understood that it was my own mind’s fault. Nobody, I repeat – NOBODY has told me that it would work for the CLUSTERED Indexes, but when I see an announcement that the Indexes are supported, I was totally believing that the traditional (not XML, no CLR, no LOB’s) Rowstore Indexes would be totally supported. Oh yes, I know that it is crazy difficult. I know that this is a pretty forward-facing feature, but come on – my mind played trick on me, telling me the story that does not exist, for now, at least.

After realising my mind’s mistake I took a deeper breath and decided to try out the Resumable Nonclustered Index Creation with the following command:

CREATE NONCLUSTERED INDEX NCI_SampleDataTable	ON dbo.SampleDataTable (c1)	WITH ( ONLINE = ON, RESUMABLE = ON );

Hopefully we get a bit more support as SQL Server vNext is developed and eventually released.  In the meantime, Niko hits some limitations but his timings for the feature look good.

Comments closed

Finding Scalar Functions In Execution Plans

Kendra Little points out that scalar user-defined functions can hide in the most unassuming of places:

After we find matches based on the customer id, we have more work “left over” — that’s the “residual” bit.

For every row that matches, SQL Server is plugging values into the Website.CalculateCustomerPrice() function and comparing the result to the Unit price column, just like we asked for in the where clause.

In other words, this is happening for every row in Sales.InvoiceLines that has a matching row in Sales.Invoices.

Which is every single invoice & invoice line, as it turns out.

It’s a shame there’s no “this is why your query is slow” plan operator for scalar UDFs.

Comments closed

Faking Arrays In T-SQL With Custom Types

Jovan Popovic shows how to use custom types as pseudo-arrays in SQL Server:

One of the missing language features in T-SQL language is array support. In some cases you can use custom types to work with arrays that are passed as parameters to your stored procedures.

Custom types in T-SQL enable you to create alias for some table, .Net or built-in type. Once you create a custom type, you can use it for local variables and parameters of functions.

I go back and forth on whether I’d like full array support in T-SQL, as on the plus side, it simplifies interactions with external tools.  On the other hand, it can promote bad habits like violating first normal form.

Comments closed

How Qubole Optimizes Apache Spark Clusters

Mikhail Stolpner gives us some tips on how to optimize Apache Spark clusters:

There are four major resources: memory, compute (CPU), disk, and network. Memory and compute are by far the most expensive. Understanding how much compute and memory your application requires is crucial for optimization.

You can configure how much memory and how many CPUs each executor gets. While the number of CPUs for each task is fixed, executor memory is shared between the tasks processed by a single executor.

A few key parameters provide the most impact on how Spark is executed in terms of resources: spark.executor.memoryspark.executor.coresspark.task.cpus, spark.executor.instances, and spark.qubole.max.executors.

This article gives us some idea of the levers we have available as well as when to pull them.  Though the article itself is vendor-specific, a lot of the advice is general.

Comments closed

Area Under The ROC Is Not Accuracy

Stephen Chen debunks bad journalistic summaries of a Google research paper:

Journalists latched onto Google’s NN 0.95 score vs. the comparison 0.86 (see EWS Strawman below), as the accuracy of determining mortality. However the actual metric the researchers used is AUROC (Area Under Receiver Operating Characteristic Curve) and not a measure of predictive accuracy that indexes the difference between the predicted vs. actual like RMSE (Root Mean Squared Error) or MAPE (Mean Absolute Percentage Error). Some articles even erroneously try to explain the 0.95 as the odds ratio.

Just as the concept of significance has different meanings to statisticians and laypersons, AUROC as a measure of model accuracy does not mean the probability of Google’s NN predicting mortality accurately as journalists/laypersons have taken it to mean. The ROC (see sample above) is a plot of a model’s False Positive Rate (i.e. predicting mortality where there is none) vs. the True Positive Rate (i.e. correctly predicting mortality). A larger area under the curve (AUROC) means the model produces less False Positives, not the certainty of mortality as journalists erroneously suggest.

The researchers themselves made no claim to soothsayer abilities, what they said in the paper was:

… (their) deep learning model would fire half the number of alerts of a traditional predictive model, resulting in many fewer false positives.

It’s an interesting article and a reminder of the importance of terminological precision (something I personally am not particularly good at).

Comments closed

How Perfmon Memory Counters Fit Together

Lonny Niederstadt takes us through a tour of how various Perfmon memory counters relate:

Wading through all of the SQL Server memory-related perfmon counters to understand how they related to each other took me a really long time.  Time-series graphs that show the relationship help me tremendously, and when I started trying to account for SQL Server memory years ago I couldn’t find any.  So I started to blog some time-series graphs, under the theory that either my understanding was correct and my graphs would be helpful to someone… or they’d be wrong and someone would correct me.
Well… its been about 5 years and my graphs haven’t generated too much discussion, but they’ve really helped me 😀😀😀

Perfmon: SQL Server Database pages + Stolen pages + Free pages = Total pages
http://sql-sasquatch.blogspot.com/2013/09/perfmon-database-pages-stolen-pages.html

Working with SQL Server 2016 and some demanding ColumnStore batch mode workloads, I began to see suspicious numbers, and graphs that didn’t make sense to me.  Today I got pretty close to figuring it out so I wanted to share what I’ve learned.

The following graphs are from a 4×10 physical server running Windows and SQL Server.  Four sockets, 4 NUMA nodes.

For bonus points, Lonny traces down a problem where expectations aren’t meeting reality.

Comments closed

Alleviating tempdb Contention

Pam Lahoud has some advice for those with tempdb-heavy workloads:

TL;DR – Update to the latest CU, create multiple tempdb files, if you’re on SQL 2014 or earlier enable TF 1117 and 1118, if you’re on SQL 2016 enable TF 3427.

And now it’s time for everyone’s favorite SQL Server topic – tempdb! In this article, I’d like to cover some recent changes that you may not be aware of that can help alleviate some common performance issues for systems that have a very heavy tempdb workload. We’re going to cover three different scenarios here:

  1. Object allocation contention

  2. Metadata contention

  3. Auditing overhead (even if you don’t use auditing)

There’s some good information in here so don’t just say tl;dr.

Comments closed

Reading The Transaction Log

Nesha Maric shows us a couple of methods and a third-party tool for reading the SQL Server transaction log:

Update operations in SQL Server are not fully logged in the transaction log. Full before-and-after values, unfortunately don’t exist, only the delta of the change for that record. For example, SQL Server may show a change from “H” to “M” when the actual record that was changed was from “House” to “Mouse”. To piece together the full picture a process must be devised to manually reconstruct the history of changes, including the state of the record prior to update. This requires painstakingly re-constructing every record from the original insert to the final update, and everything in between.

BLOBs are another challenge when trying to use fn_dblog to read transaction history. BLOBs, when deleted, are never inserted into the transaction log. So examining the transaction log won’t provide information about its existence unless the original insert can be located. But only by combining these two pieces of data will you be able to recover a deleted BLOB file. This obviously requires that the original insert exists in the active/online portion of the transaction log, the only part accessible to fn_dblog. This may be problematic if the original insert was done some weeks, months or years earlier and the transaction log has been subsequently backed up or truncated

I’ve tried to avoid messing directly with the transaction log whenever possible, but there are scenarios where it’s the only place you have needed information.

Comments closed

When Multiple Missing Indexes Exist

Brent Ozar shows what happens when there are multiple missing indexes for a query:

SQL Server Management Studio only shows you the first missing index recommendation in a plan.

Not the best one. Not all of them. Just whichever one happens to show up first.

Using the public Stack Overflow database, I’ll run a simple query:

But that behavior isn’t the case for all tools; SQL Operations Studio is a bit different.

Comments closed