Press "Enter" to skip to content

Curated SQL Posts

Blocking Merge Statement

Kendra Little walks through the MERGE command and potential blocking issues with it:

The holdlock hint is a way to get serializable isolation level in SQL Server for a specific table, without having to change the isolation level for your entire session. Serializable is the highest isolation level in SQL Server using pessimistic locking.

When you “HOLDLOCK”, you tell SQL Server to protect any rows you read with a range lock– just in case someone comes along and tries to change one or sneak  one in.

That means that even when you’re just reading ParentTable and not inserting a row, you’re taking out a key range lock. You’re willing to fight other users over those rows to protect your statement.

Kendra’s final advice is to avoid the MERGE command in most cases; read on to find out why.

Comments closed

ALTER DATABASE

Dave Mason notes that ALTER DATABASE needs to be in an autocommitted transaction and will not roll back:

If you create that trigger and subsequently change the recovery model, you’ll get an error. But the recovery model doesn’t get changed back to its original setting. As the TSql comment notes, ROLLBACK doesn’t work in this context. And that’s a shame too. Sometimes the recovery model gets changed on accident, by a third-party vendor’s software installation, or by someone that isn’t familiar with the ramifications. I was trying to prevent that. <Grumble> At least I knew it wasn’t allowed. I just didn’t know why. After seeing this Connect item, “why” didn’t matter.

Dave goes on to point out that under certain circumstances, ALTER DATABASE will simply fail, so there are preventative checks; we just can’t build our own.

Comments closed

Getting Current File Name In SSIS

Bill Fellows shows how to get the currently processing file name in SSIS:

So, as much as I like the built in solution, my pattern is to use a Derived Column to inject the file name into the Data Flow. I have a variable called CurrentFileName in all my packages. That contains the design-time path for my Flat File Connection Manager (or Excel). My Connection Manager will then have the ConnectionString/ExcelFilePath property assigned to be @[User::CurrentFileName]. This positions me for success because all I need to do is ensure that whatever mechanism I am using to determine my source file correctly populates that variable. In this post, a ForEach File Enumerator will handle that.

Within my Data Flow Task, I will add a Derived Column Transformation that adds my package variable into the data flow as a new column. Here, I am specifying it will be of data type DT_STR with a length of 130.

He follows up with some Biml to drive home the point.

Comments closed

Pester For Configuration Checks

Rob Sewell talks about SQL Server configuration testing using Pester, a Powershell test framework:

Occasionally, for reasons I haven’t explored I had to test against the value property of the returned object

          It "The Full User Database Backup should be scheduled Weekly $OlaUserFullSchedule" {
$Return.OlaUserFullSchedule.value | Should Be $OlaUserFullSchedule
}

I wanted to be able to run the tests against environments or groups of servers with different default values so I parameterised the Test Results as well and then the logical step was to turn it into a function and then I could do some parameter splatting. This also gives me the opportunity to show all of the things that I am currently giving parameters to the test for

This is a nice walkthrough of Rob’s contribution, available on GitHub.

Comments closed

NUMA Node Count

Denny Cherry shows us how to find the number of NUMA nodes a SQL Server has:

The next way to see what your NUMA configuration is, is to look at the ERRORLOG file. Every time SQL Server is started it will log the NUMA configuration towards the beginning of the ERRORLOG file. That line will look something like this example.

Server Node configuration: node 0: CPU mask: 0x00000000000000ff:0 Active CPU mask: 0x00000000000000ff:0. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.

Denny shows four different ways of getting this answer, and then provides some general advice on the maximum number of nodes you probably want.

Comments closed

Designing For Usability In Power BI

Avi Singh walks through a usability scenario in Power Pivot and Power BI:

Here is how we can go about combining the Products and the newly created SubCategory table for a more usable and elegant solution. I have highlighted the key steps, you can examine the solution file to go step by step in the Power BI Query.
Note: Don’t be scared of the M code, 99% of these steps were generated using the Query ribbon – I have included those screenshots as well.

When developing BI models, it’s important to keep things as simple as possible.  Your desired end user likely does not have the necessary skill level to wade through normalized table designs, so make it easy for them to get their jobs done.

Comments closed

Colleen Morrow shows how to automate Service Broker message handling:

The other way to automate this stored procedure is by attaching it to the queue itself so that it Service Broker directly executes the procedure in a process called activation.  With activation Service Broker starts an application whenever there is work to do (e.g. when messages are in the queue).  When we enable activation for a queue, SQL Server spawns a queue monitor process to keep an eye on the queue and check every few seconds to see whether there are messages available for processing.  There are 2 types of activation:

  • Internal activation – A stored procedure like the one above is attached to the queue to process messages automatically as they arrive.

  • External activation – An external program is notified that messages need to be processed.

Colleen focuses on internal activation, which calls a stored procedure for each message received.  Check her post out.

Comments closed

Spark Optimizations

Over at the DZone blog, we learn how to use Distribute By and Cluster By to optimize Spark performance:

Your DataFrame is skewed if most of its rows are located on a small number of partitions, while the majority of the partitions remain empty. You really should avoid such a situation. Why? This makes your application virtually not parallel – most of the time you will be waiting for a single task to finish. Even worse, in some cases you can run out of memory on some executors or cause an excessive spill of data to a disk. All of this can happen if your data is not evenly distributed.

To deal with the skew, you can repartition your data using distribute by. For the expression to partition by, choose something that you know will evenly distribute the data. You can even use the primary key of the DataFrame!

It’s interesting to see how cluster by, distribute by, and sort by can have such different performance consequences.

Comments closed

Kafka 0.10

Kafka 0.10 is now available:

Kafka Streams: Kafka Streams was introduced as part of thetech preview release of the Confluent Platform few months ago and is now available through Apache Kafka 0.10.0.0. Kafka Streams is a library that turns Apache Kafka into a full featured, modern stream processing system. Kafka Streams includes a high level language for describing common stream operations (such as joining, filtering, and aggregating records), allowing developers to quickly develop powerful streaming applications. Kafka Streams offers a true event-at-a-time processing model, handles out-of-order data, allows stateful and stateless processing and can easily be deployed on many different systems— Kafka Streams applications can run on YARN, be deployed on Mesos, run in Docker containers, or just embedded into existing Java applications.

There are some nice improvements in this latest version of Kafka.

Comments closed