Press "Enter" to skip to content

Curated SQL Posts

Normalization and Reduced Blocking

Erik Darling points out one of the many benefits of normalizing tables in a database:

Looking at the design, there are two big problems:

1. There are “order” columns that are going to get a lot of inserts and updates
2. You’re going to be storing the same customer information over and over again

The more related, but not independent, data you store in the same table, the harder it becomes to effectively index that table.

My take on this is that the old adage of “Normalize until it hurts; denormalize until it works” hasn’t been operative for the past 15 years, when the SSD era began.

Comments closed

DTUs and vCores in Azure SQL Database

Arun Sirpal gives us one take on whether to use DTUs or vCores for Azure SQL Database:

If you are building database solutions in Azure , using Azure SQL Database then you will know that you have a purchasing option decision to make. That being should you use a vCore model or DTU approach?

Arun prefers the vCore model for solid reasons. I agree with the sentiment when we’re talking about production databases. For small-scale, personal stuff, however, the bottom end of the DTU model is much less expensive.

Comments closed

Grouping Data with Spark

Ed Elliott has two quick examples of grouping data in Spark:

I have been playing around with the new Azure Synapse Analytics, and I realised that this is an excellent opportunity for people to move to Apache Spark. Synapse Analytics ships with .NET for Apache Spark C# support many people will surely try to convert T-SQL code or SSIS code into Apache Spark code. I thought it would be awesome if there were a set of examples of how to do something in T-SQL, then translated into how to do that same thing in Spark SQL and the Spark DataFrame API in C#.

Click through for the first example, GROUP BY.

Comments closed

T-SQL Tuesday 134 Roundup

James McGillivray summarizes the results from T-SQL Tuesday #134:

When I volunteered to host a T-SQL Tuesday, I had a very different topic in mind. However, the incredible events of the last year, and in particular, the immense pressure that my wife faced at work, made me realise how important it is to have ways to take breaks, both mental and physical. And while we were away in December, and we both recharged, I thought it would make a good topic for this event. It was wonderful to see the response from the #sqlfamily to my invitation, and by my count 29 different people contributed to the blog party.

I’ve tried to group posts with similar themes in this summary, and since some posts fall into multiple categories, I may mention a single post more than once. Links on names point to Twitter handles, links on descriptions point to the respective blog posts.

Click through for a rather large roundup.

Comments closed

The Concatenation Operator

Hugo Kornelis explains what the Concatenation operator does:

The Concatenation operator reads and returns all rows from all its inputs, in order, and without modification.

This operator is most commonly used to execute queries that use UNION or UNION ALL. In the former case, other operators are required to remove the duplicates, because Concatenation doesn’t provide that functionality. You may also find the Concatenation operator in queries on partitioned views.

Read on to see the algorithm and lots of details about the operator.

Comments closed

Using DEFINE COLUMN in DAX Queries

Marco Russo and Alberto Ferrari show off some new DAX syntax:

Introduced in December 2020, the DEFINE COLUMN statement lets you define a calculated column local to a query. The column is not persisted in the model, it exists only for the lifetime of the query. Apart from that, it is a calculated column in every sense of the term.

The extension of DAX with the capability to define calculated columns local to a query is needed in order to support composite models over Analysis Services (AS). There are no limitations in the use of the feature. For this reason, you can take advantage of local columns in any DAX query. We refer to calculated columns defined in a query as query calculated columns, or query columns for short.

Click through to see it in action. I like this idea a lot, though do read their note regarding performance, contrasting it with ADDCOLUMNS.

Comments closed

Driving Action with Graphs

Elizabeth Ricks completes a series on designing visuals. Question 4 involves the audience:

When communicating with data, always consider what broader purpose it serves. Too often, we don’t pause and think about the context (understandable in the real-world maelstrom of deadlines, constraints and organizational politics!). Rather, we force the visuals that we used to analyze the data upon our audience, hoping they’ll come to the same conclusion. 

Question 5 looks at the text on a visual:

In data visualization, words can be more powerful than we realize. When designing graphs, the non-negotiable text includes explicit axes labels, descriptive chart titles, supporting annotations, and footnotes with relevant methodology and assumptions. Never assume it’s apparent what’s being shown—make it your default to include this important context so your audience doesn’t have to guess what they’re seeing. 

Be sure to read both.

Comments closed

Loading a Spark DataFrame in .NET

Ed Elliott shows how to get data and convert it into a Spark DataFrame using .NET:

When I first started working with Apache Spark, one of the things I struggled with was that I would have some variable or data in my code that I wanted to work on with Apache Spark. To get the data in a state that Apache Spark can process it involves putting the data into a DataFrame. How do you take some data and get it into a DataFrame?

This post will cover all the ways to get data into a DataFrame in .NET for Apache Spark.

Click through for several methods.

Comments closed