Press "Enter" to skip to content

Curated SQL Posts

An UPSERT Pattern to Avoid

Aaron Bertrand doesn’t like a common insert/update pattern:

I think everyone already knows my opinions about MERGE and why I stay away from it. But here’s another (anti-)pattern I see all over the place when people want to perform an upsert (update a row if it exists and insert it if it doesn’t):

IF EXISTS (SELECT 1 FROM dbo.t WHERE [key] = @key) BEGIN
UPDATE dbo.t SET val = @val WHERE [key] = @key;
END
ELSE
BEGIN
INSERT dbo.t([key], val) VALUES(@key, @val);
END

This looks like a pretty logical flow that reflects how we think about this in real life:

Does a row already exist for this key?
YES: OK, update that row.
NO: OK, then add it.

Click through to learn why this is a bad idea.

Comments closed

Implementing an LSTM Model with Python

Mrinal Walia takes us through the concept of Long Short Term Memory:

A simple Recurrent Neural Network has a very simple structure, that forms a chain of repeating modules of a neural network, with just a single activation function such as tanh layer, similarly LSTM too have a chain-like structure with repeating modules just like RNN but instead of a single Neural network layer in RNN, LSTM has four layers which are interacting in a very different way each performing its unique function in the network.

Read on for a good amount of theory followed by an example using Keras.

Comments closed

Generating Predictions with SQL Server ML Services

Jeffin Mathew walks us through SQL Server Machine Learning Services:

The purpose of this blog is to explore the process of running ML predictions on SQL server using Python. We are going to train and test the data to predict information about bike sharing for a specific year. We are going to be using the provided 2011 data and predict what 2012 will result in. The 2012 data already exists inside the dataset, so we will be able to compare the predicted to the actual amount.

For certain use cases—especially when the data already exists in SQL Server, and especially especially when you can use native scoring—Machine Learning Services does a great job.

Comments closed

Columnstore Query Patterns

Ed Pollack walks us through some query patterns which do and don’t work very well with columnstore indexes:

Reading data from a highly compressed analytical structure is quite different from the query patterns used on transactional data. By leveraging metadata, data order, segment elimination, and compression, large tables can be quickly read and results returned in seconds (or less!).

Taking this further, read queries against columnstore indexes can be further optimized by simplifying queries and providing the query optimizer with the easiest path to the smallest columnstore scans needed to return results.

This article explores the most efficient ways to read a columnstore index and produce guidelines and best practices for analytics against large columnstore data structures.

Read on for good advice.

Comments closed

Weighted Randomization using T-SQL

Louis Davidson won’t be stuck with uniform distributions:

The thing is, while I have 77 pictures of my favorite roller coaster, Expedition Everest at Disney’s Animal Kingdom, and 70 of The Tower of Terror at Hollywood Studios, I only have 2 of the Flame Tree Restaurant at Animal Kingdom and many other things that aren’t as exciting to post about. If I randomly choose attractions using a non-weighted random number generator, it would be just as likely to get the lesser items as the same frequency as the greater items. Hence, I want my popular items to come up most frequently, but every once in a while, I want to be surprised by something different.

This is where I needed to build a weighted randomized value.

Read on to see how Louis implements this.

Comments closed

Disabled Indexes Tell No Compression Tales

Eric Cobb gives us a warning around disabling indexes:

Here at work we have a very large, very intensive data load that disables and rebuilds indexes as part of the process. We recently added compression to many of the tables and indexes in the database because it was growing quite large (around 28TB at the time). After adding compression, we got the database size down to somewhere around 17TB.

So you can imagine our surprise when the DB size jumped back up to over 30TB after the last data load! In trying to figure out what happened I discovered that most of the data compression was gone.

That’s…not great. Eric shows us a demo as well and notes that it still applies to SQL Server 2019. I’d be apt to call it a bug, myself.

Comments closed

Tips for Creating Azure Data Studio Database Projects

Kevin Chant offers some insights for us:

One of the options within the SQL database Projects extension is that you can publish your project to another SQL Server database. Of course, this is only for one database.

So, what do you do if you want to update multiple databases with one project? Well one option is to create a dacpac from your project and use that dacpac to update multiple databases.

You have a couple of options if you wish to do this.

Read on for some helpful tips.

Comments closed

Dealing with Failing SQL Agent Jobs

Garry Bargsley has started a four-part series:

SQL Server Agent Jobs are one of the core features of SQL Server. Agent Jobs perform many actions, from maintenance tasks (server and database), data movement flows, replication, log shipping, data cleanup, health checks and many more. Since Agent Jobs are a critical component in a data organization, it is important to know when Agent Jobs do not succeed. There are several ways to accomplish the monitoring of failed Agent Jobs, from easy to more complex. This four part series is going to cover how you, the DBA, can be notified of failing Agent Jobs in your SQL Server environment.

Click through for part one, which is all about finding failed jobs and filtering down to relevant jobs using dbatools.

Comments closed

Principal Component Analysis in Azure ML

Dinesh Asanka walks us through Principal Component Analysis as an Azure ML Studio data transformation technique:

We will be discussing one of the most common Data Reduction Technique named Principal Component Analysis in Azure Machine Learning in this article. After discussing the basic cleaning techniquesfeature selection techniques in previous articles, now we will be looking at a data reduction technique in this article.

Data Reduction mechanism can be used to reduce the representation of the large dimensional data. By using a data reduction technique, you can reduce the dimensionality that will improve the manageability and visualability of data. Further, you can achieve similar accuracies.

Read on for the demo.

Comments closed