Press "Enter" to skip to content

Curated SQL Posts

R Services Internals

Niels Berglund has an excellent series on R Services internals.  Here’s the latest post:

This post is the ninth post about Microsoft SQL Server R Services, and the eight post that drills down into the internal of how it works.

So far in this series we have been looking at what happens in SQL Server as well as the launchpad service when we execute sp_execute_external_script, and we have still no real “clue” to where the R engine comes into play.

Well, hopefully that will change (at least a little bit) with this post, as we here will look at what happens when we leave the launchpad service.

This series is like candy to me.  It’s the best write-up I’ve seen so far about what’s really happening when you run SQL Server R Services.

Comments closed

Powershell Parameter VAlidation

Adam Bertram shows how to write safer code by validating parameters:

Simply adding [Parameter()] to the parameter block makes this function “advanced.” Once we have this in place, we can then add in some parameter validation code. In our case, we need to ensure that the file provided with FilePath actually exists before proceeding. To do this, we’ll use the ValidateScript attribute which allows us to run any code we want. As long as it returns $true, it will allow the function to run.

function Set-File {
param(
[Parameter()]
[ValidateScript({ Test-Path -Path $_ -PathType Leaf})]
[string]$FilePath
)
}

Notice that I used the pipeline variable $_ in this example. This represents whatever value the parameter is. If the user provides a file path that does not exist, they will be notified.

Read the whole thing.

Comments closed

Multiple Data Sets In External Scripts

Tomaz Kastrun shows a workaround to the “one data set” limit in sp_execute_external_script:

Some of the  arguments of the procedure sp_execute_external_script are enumerated. This is valid for the inputting dataset and as the name of argument @input_data_1 suggests, one can easily (and this is valid doubt) think, there can also be @input_data_2 argument, and so on. Unfortunately, this is not true.  External procedure can hold only one T-SQL dataset, inserted through this parameter.

There are many reasons for that, one would be the cost of sending several datasets to external process and back, so inadvertently, this forces user to rethink and pre-prepare the dataset (meaning, do all the data munging beforehand), prior to sending it into external procedure.

But there are workarounds on how to pass additional query/queries to sp_execute_external_script. I am not advocating this, and I strongly disagree with such usage, but here it is.

It does feel like a hinky solution, but sometimes you just need to get two data sets in.

Comments closed

Using Watchtower To Refresh SQL Server Containers

Andrew Pruski explains how to use Watchtower to keep your SQL Server Docker containers up to date:

The databases that I store in my container image are updated on a weekly basis and currently, the process to update our containers is manual. Once the updated image has been created, the existing running containers are dropped and new ones created from the updated image.

But what if we could automatically refresh our containers with the updated image? If we could do that then the only process that’s manual is updating the image. We would no longer have to worry about any containers running SQL instances with databases that are out of date.

Luckily, there’s a way to do this and it’s accessible via an image on the Docker Hub called Watchtower. What Watchtower does is monitor the Docker Hub and if there’s an update to an image it will automatically refresh all running containers that are on the same host.

Read on for a step-by-step solution.

Comments closed

SQLOS Task Origins

Ewald Cress explains how SQLOS tasks come into being:

The above system tasks wear their purpose on their sleeves, because the function pointer in the SOS_Task::Param is, well, to the point. The tasks that run user queries are more abstract, because the I/O completion port listener can’t be bothered with understanding much beyond the rudiments of reading network packets – it certainly can’t be mucking about with fluent TDS skills, SQL parsing and compilation, or permission checks.

So what it does is to enqueue a task that speaks TDS well, pointing it to a bunch of bytes, and sending it on its way. Here is an example of such a dispatch, which shows the “input” side of the WorkDispatcher:

Read the whole thing.

Comments closed

Cycling The SSMS Clipboard Ring

Bert Wagner shows how you can use the clipboard ring in SQL Server Management Studio:

Copying and pasting is a feature that’s available in nearly every text editor (“nearly” — anyone remember the days before iOS had a clipboard?).

However, SQL Server Management Studio goes above and beyond the regular copy and paste feature set — it has a clipboard ring.

What’s a clipboard ring you ask?

The clipboard ring let’s you cycle through the last 20 things you copied onto your clipboard when you go to paste in SSMS.

Read on to learn how to use the clipboard ring.  When I remember that it exists, it’s quite useful.

Comments closed

Top Jobs Report

Kenneth Fisher only looks at the best SQL Agent jobs:

But sometimes what you want is an overall history. Maybe you want to see your longest running jobs? Or the most frequent jobs? Or jobs that are failing? There is a great pre-built report just for that.

Now, be warned. The report and the history view both pull from the system views and they only hold so much history. You need to make s

With that warning stated, this is a good report if you just want to get some idea of what’s happening on your server.

Comments closed

Prioritizing Rows In A Union

Daniel Hutmacher shows several ways to combine data from multiple sources, prioritized by source:

For the sake of simplicity, I’ll assume that our example data has a clustered index on the “primary key” that we want to use to determine which rows have already been included in the set – in our example, the primary key is (FirstName, LastName). The data I’m using is a fictional example, but here’s the jist:

  • #Employees has about 33 000 rows.
  • #Customers has about 44 000 rows.
  • #Passengers has about 500 000 rows.

The data is constructed in a way that these queries should return 530 000 rows, so we’re looking at some overlap but far from totally overlapping rows.

Example #492,806 that T-SQL is not a true 4th Generation Language, that how you write the query can greatly matter for performance.

Comments closed

Kafka Streams Basics

Anuj Saxena walks through Kafka Streams and provides a quick example:

The features provided by Kafka Streams:

  • Highly scalable, elastic, distributed, and fault-tolerant application.

  • Stateful and stateless processing.

  • Event-time processing with windowing, joins, and aggregations.

  • We can use the already-defined most common transformation operation using Kafka Streams DSL or the lower-level processor API, which allow us to define and connect custom processors.

  • Low barrier to entry, which means it does not take much configuration and setup to run a small scale trial of stream processing; the rest depends on your use case.

  • No separate cluster requirements for processing (integrated with Kafka).

  • Employs one-record-at-a-time processing to achieve millisecond processing latency, and supports event-time based windowing operations with the late arrival of records.

  • Supports Kafka Connect to connect to different applications and databases.

Read on for more details as well as a sample script to get started.

Comments closed

Random Forests In R

Anish Sing Walia explains the basics of random forests and provides sample code in R:

Random Forests are similar to a famous Ensemble technique called Bagging but have a different tweak in it. In Random Forests the idea is to decorrelate the several trees which are generated on the different bootstrapped samples from training Data.And then we simply reduce the Variance in the Trees by averaging them.
Averaging the Trees helps us to reduce the variance and also improve the Perfomance of Decision Trees on Test Set and eventually avoid Overfitting.

The idea is to build lots of Trees in such a way to make the Correlation between the Trees smaller.

Random forests frequently give a good answer to classification problems, enough so as to make them a nice starting point.

Comments closed