Press "Enter" to skip to content

Curated SQL Posts

Desired State Configuration Resources

Jess Pomfret takes us through resources in Desired State Configuration:

A critical part of our DSC configuration is made up of resources. These are the building blocks we need to to define our desired state.  There are two kinds of resources that we can use: class based and MOF based (most common). We are going to focus our efforts today on looking at MOF based resources.

Resources come packaged up as modules and our servers, which use at least WMF 4.0, come with several built-in. We have two main options for additional resources; we can find DSC resource modules in the PowerShell Gallery or we can write our own.

Jess wraps up the post with five useful resources for database administrators.

Comments closed

Limitations on Inlining Functions

Erik Darling takes us through a limitation to SQL Server 2019 scalar function inlining:

There’s a lot of excitement (alright, maybe I’m sort of in a bubble with these things) about SQL Server 2019 being able to inline most scalar UDFs.

But there’s a sort of weird catch with them. It’s documented, but still.

If you use GETDATE in the function, it can’t be inlined.

GETDATE() and its bretheren are non-deterministic so I figured that would be an issue. Check out the documentation for the other limitations.

Comments closed

Getting Started with Azure Databricks

Brad Llewellyn has a tutorial for Azure Databricks:

Databricks is a managed Spark framework, similar to what we saw with HDInsight in the previous post.  The major difference between the two technologies is that HDInsight is more of a managed provisioning service for Hadoop, while Databricks is more like a managed Spark platform.  In other words, HDInsight is a good choice if we need the ability to manage the cluster ourselves, but don’t want to deal with provisioning, while Databricks is a good choice when we simply want to have a Spark environment for running our code with little need for maintenance or management.

Azure Databricks is not a Microsoft product.  It is owned and managed by the company Databricks and available in Azure and AWS.  However, Databricks is a “first party offering” in Azure.  This means that Microsoft offers the same level of support, functionality and integration as it would with any of its own products.  You can read more about Azure Databricks herehereand here.

Click through for a demonstration of the product.

Comments closed

Solving Logistic Regression Problems with Python

Hardik Jaroli shows how we can solve logistic regression problems using Python, using the Titanic data set as an example:

We will be working with the Titanic Data Set from Kaggle. We’ll be trying to predict a classification- survival or deceased.

Let’s begin by implementing Logistic Regression in Python for classification. We’ll use a “semi-cleaned” version of the titanic data set, if you use the data set hosted directly on Kaggle, you may need to do some additional cleaning.

Click through for the demo.

Comments closed

Deploying and Executing Containerized Packages

Andy Leonard continues a series on Integration Services in Docker. Part 5 shows how you can deploy a package to a containerized SSIS instance:

Returning to Matt Masson’s PowerShell script – combined with the docker volume added earlier – I have a means to deploy an SSIS Project to the SSIS Catalog in the container.

Part 6 shows how we can run those packages:

An aside regarding attempting SSIS package execution from SSMS connected to an instance of SQL Server in a container (using the runas /netonly trick shared earlier: It appears to work, but doesn’t. The package execution is created but “hangs” in Pending Execution status:

Read both to learn more about Andy’s travails in getting this working.

Comments closed

Scripting Database Restores

Max Vernon helps us out with a query to generate a database restore command:

Just point the script at an existing SQL Server Backup File, and give the new database a name, along with a target folder for the data and log files, and press F5. This script is compatible with SQL Server 2005 and higher, and has been tested on a case-sensitive-collation server.

I think building these out by hand is good practice and helps you learn, but when it’s crunch time, you really want to have a script do the work for you.

Comments closed

Finding High-Cardinality Columns

Constantine Kokkinos shows how you can find the cardinality of each column on a SQL table:

Today I was diving into some extremely wide tables, I wanted to take a quick look at things like “How many unique values does this table have in every column?”.

This can be super useful if you have a spreadsheet of results or a schema without effective normalization and you want to determine which rows are the “most unique” – or have high cardinality.

The Github gist is embedded at the bottom of the page, but I will run you through the code in case you want an explanation of how it works

Click through for the script.

Comments closed

The Performance Hit From Ignoring Duplicate Keys

Paul White explains why there is a big performance hit when using IGNORE_DUP_KEY on clustered indexes:

The IGNORE_DUP_KEY index option can be specified for both clustered and nonclustered unique indexes. Using it on a clustered index can result in much poorer performance than for a nonclustered unique index.

The size of the performance difference depends on how many uniqueness violations are encountered during the INSERT operation. The more violations, the worse the clustered unique index performs by comparison. If there are no violations at all, the clustered index insert may even perform better.

I use IGNORE_DUP_KEY primarily in cases like queue tables where I might be queuing up changes to migrate to a warehouse and where the chance of collision is low but non-zero. It looks like pushing much beyond that pattern can be devastating for performance.

Comments closed

What’s New With KSQL

Robin Moffatt looks into additions to KSQL with Confluent Platform 5.2:

PRINT is one of those features you may not quite grok until you start using it…and then you’ll wonder how you lived without it. It provides a simple way of displaying the contents of a Kafka topic and figures out itself which deserialiser to use. Avro? No problem! JSON? Bring it on!

In KSQL 5.2, the PRINT feature gets even better as you can specify how many records you’d like to see from the topic using the LIMIT clause.

These are some good additions.

Comments closed