Press "Enter" to skip to content

Curated SQL Posts

Introduction To Amazon Kinesis

Jen Underwood describes Amazon Kinesis:

Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. Amazon Kinesis is ideal for Internet of Things (IoT) use cases. It can collect and process hundreds of terabytes of data per hour from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time, from sources such as web site click-streams, Raspberry Pi gadgets, devices, social media, operational logs, metering data and more.

With Amazon Kinesis, you can build real-time dashboards, capture exceptions, execute algorithms, and generate alerts. With point-and-click menus, you can ingest data, query it and then send output to a variety of destinations including but not limited to Amazon S3, Amazon EMR, Amazon DynamoDB, or Amazon Redshift.

Kinesis is powerful, especially if you’re already locked into the AWS platform.  My preference is Apache Kafka, but Kinesis is definitely worth learning about.

Comments closed

Create Index With Drop_Existing Bug

Kendra Little describes a bug that she encountered in discussions with a reader:

My first thought was that perhaps there is some process that runs against the production system and the test system that goes to sleep with an open transaction, holding an X or an IX lock against this table. If the index create can’t get its shared lock, then it could be part of a blocking chain.

So I asked first if the index create was the head of the blocking chain, or if it was perhaps blocked by something else. The answer came back that no, the index create was NOT blocked. It was holding the shared lock for a long time.

My new friend even sent a screenshot of the index create running against the test instance in sp_WhoIsActive with blocking_session_id null.

Read on for the full story and keep those systems patched.

Comments closed

Aggregate Predicate Pushdown And Data Types

Niko Neugebauer shows an example of how a slightly different data type can cause columnstore queries to be much faster:

Even though they are estimated to cost the same (50% for each one) with the estimated cost of 0.275286 to be more precise in this sense.
To be more precise in the reality you will notice the Aggregate Predicate Pushdown taking place on the first query, while the second query is using the Storage Engine to read out all of the 2 million rows from the table and filter it in the Hash Match iterator.

Actual Number of Locally Aggregated Rows
is the one property on the Columnstore Index Scan iterator that will give you an insight on what happened within the Columnstore Index Scan, since the Aggregate Predicate Pushdown is not shown as a filter on the property. This is not the most fortunate solution as far as I am concerned, but since the 0 rows flowing out of the Columnstore Index Scan will serve as a good indication that Aggregate Predicate Pushdown took place, but if you want to be sure of all the details you will need to check the properties of the involved iterators.

Definitely worth reading.

Comments closed

Understanding DBCC TEC

Ewald Cress explains (but does not document!) an undocumented DBCC command:

Boring old disclaimer: What I am describing here is undocumented, unsupported, likely to change between versions, and will probably make you go blind. In fact, the depth of detail exposed illustrates one reason why Microsoft would not want to document it: if end users of SQL Server found a way to start relying on this not changing, it would hamstring ongoing SQL Server improvement and refactoring.

With that out of the way, let’s dive right into DBCC TEC, a command which can dump a significant chunk of the object tree supporting a SQL Server session. This result is the same thing that shows up within a dump file, namely the output of the CSession::Dump() function – it’s just that you can invoke this through DBCC without taking a dump (cue staring match with Kendra). Until corrected, I shall imagine that TEC stands for Thread Execution Context.

I appreciate Ewald’s ability to make sense out of the madness of database internals.

Comments closed

Using OUTPUT To Get Change Counts

Manoj Pandey shows how to use the OUTPUT clause to determine the number of records inserted, updated, or deleted after a DML statement:

–> Question:

How can I get the numbers of records affected in the Merge statement, INSERT,UPDATE,DELETE separately and store it in a variable so I can get it in the application side?

Thanks !
–> My Answer:

You need to use OUTPUT clause with MERGE statement

Click through for a code sample.  The OUTPUT clause also works for non-MERGE statements like INSERT, UPDATE, and DELETE, though the “get changes by type” problem is really limited to the MERGE statement.

Comments closed

doAzureParallel

JS Tan announces a new R package:

For users of the R language, scaling up their work to take advantage of cloud-based computing has generally been a complex undertaking. We are therefore excited to announce doAzureParallel, a lightweight R package built on Azure Batch that allows you to easily use Azure’s flexible compute resources right from your R session. The doAzureParallel package complements Microsoft R Server and provides the infrastructure you need to run massively parallel simulations on Azure directly from R.

The doAzureParallel package is a parallel backend for the popular foreach package, making it possible to execute multiple processes across a cluster of Azure virtual machines with just a few lines of R code. The package helps you create and manage the cluster in Azure, and register it as a parallel backend to be used with foreach.

It’s an interesting alternative to building beefy R servers.

Comments closed

Operating System Error 3

Stacy Brown provides common reasons for why you might get Operating System Error 3:

Sometimes the users of SQL Backup Master may face the following error while executing the database backup job:

Msg 3201, Level 16, State 1, Line 1
Job Execution Error: Cannot open backup device ‘’ Operating System error 3 (The system cannot find the path specified.)

Now, there can be the various possible reasons behind the occurrence of this error. Therefore, in the following sections, all possible reason with their respective solutions are discussed. A user can refer them to solve this SQL Server operating system error 3(the system cannot find the path specified.)

Click through for solutions to several potential causes of this error.

Comments closed

Explaining DTUs

Andy Mallon explains what a Database Transaction Unit is:

I’d like to point out that the definition of a DTU is that it’s “a blended measure of CPU, memory, and data I/O and transaction log I/O…” None of the perfmon counters used by the DTU Calculator take memory into account, but it is clearly listed in the definition as being part of the calculation. This isn’t necessarily a problem, but it is evidence that the DTU Calculator isn’t going to be perfect.

I’ll upload some synthetic load into the DTU Calculator, and see if I can figure out how that black box works. In fact, I’ll fabricate the CSVs completely so that I can totally control the perfmon numbers that we load into the DTU Calculator. Let’s step through one metric at a time. For each metric, we’ll upload 25 minutes (1500 seconds–I like round numbers) worth of fabricated data, and see how that perfmon data is converted to DTUs.

Andy then goes on to show how the DTU Calculator estimates DTU usage given different resource patterns.  It’s a very interesting process and Andy clarified it considerably.

Comments closed

Benchmarking

Lukas Eder notes that the best way to compare performance is to…compare performance:

To bust a myth, if you have good reasons to think that a differently written, but semantically equivalent query might be faster (on your database), you should measure. Don’t even trust any execution plan, because ultimately, what really counts is the wall clock time in your production system.

If you can measure your queries in production, that’s perfect. But often, you cannot – but you don’t always have to. One way to compare two queries with each other is to benchmark them by executing each query hundreds or even thousands of times in a row.

Lukas goes on to compare a left join to a correlated subquery in three separate database products and the results are very interesting although absolutely not comparable across products because the DeWitt Clause is still a thing.  Great read.

Comments closed

Unavailable Azure VM Sizes

Melissa Coates gives a few of the major reasons why a particular Azure VM size may not be available when you go to resize your VM:

Just a quick tip about why you might notice some sizes are not available when you are attempting to change the size/scale level of an Azure virtual machine in the portal.

I wanted to change one of my Development VMs to a DS12_v2, but that choice wasn’t available:

It didn’t immediately dawn on me why it wasn’t available, so I thought I’d try PowerShell:

Read on for the solution, as well as a few other common causes.

Comments closed