Press "Enter" to skip to content

Curated SQL Posts

All About The Compute Scalar Operator

Hugo Kornelis continues a quest to document query plan operators:

The Compute Scalar operator is used to compute new values based on other columns in the same row. These new values are then added as extra column(s) in the output rows.

The expressions used to compute the new values can only refer to constant values and to columns in the input rows of the Compute Scalar operator. Other than that, there are, to my knowledge, no restrictions. The expressions can vary from very simple to extremely complex. The expressions can even include references to scalar user-defined functions, to CLR user-defined functions, and to built-in CLR functions.

Read on for a good deal of information about the operator.

Comments closed

Monitoring Identity Columns

Michael J. Swart has a script to monitor identity columns:

My friend pointed out an interesting RCA by Github where a

database table’s auto-incrementing ID column exceeded [maxint]. When we attempted to insert larger integers into the column, the database rejected the value

This led to a discussion about setting up monitoring for this kind of problem in our software. We have a place for monitoring and health-checks for all our databases. We just need to know how to define them.

So how do I create a script that reports any tables whose current identity values are above a certain threshold? This is what I came up with. Maybe you’ll find it useful too.

Click through for the script, but also read the comments for a simplification. The next step in the process is to figure out which tables can handle it and auto-reseed when you get to a certain threshold. Most commonly, this would be queue tables, where the active set of IDs in use is a relatively small number and there are no ID stragglers hanging around, so resetting is safe.

Comments closed

Getting the Last Query Plan Stats in SQL Server 2019

John Morehouse walks us through retrieving the actual query plan stats of the last run of an execution plan:

Currently, if you are not on SQL Server 2019 and wanted to see an execution plan, you would attempt to dive into the execution plan cache to retrieve an estimated plan.  Keep in mind that this just an estimated plan and the actual plan, while the shape should be the same, will have runtime metrics.  These  Actual runtime metrics could be, different than what is shown in the estimated plan, so it is important to get the actual whenever possible.

With the introduction of lightweight statistics, SQL Server can retain the metrics of the actual execution plan if enabled.  Note that this could introduce a slight increase in overhead, however, I haven’t yet seen it be determinantal. These metrics are vastly important when performing query tuning.

Read on to see the specific set of metrics you can pull and how to do it. This does require SQL Server 2019.

Comments closed

Azure Data Factory and Key Vault References

Gerhard Brueckl shows how we can get around a limitation in the Azure Data Factory user interface:

As You can see, the setting “AccessToken” can use a Key Vault reference whereas settings like “Databricks Workspace URL” and “Cluster” do not support them. This is usually fine because the guys at Microsoft also thought about this and support Key Vault references for the settings that are actually security relevant or sensitive. Also, providing the option to use Key Vault references everywhere would flood the GUI. So this is just fine.

But there can be good reasons where you want to get values from the Key Vault also for non-sensitive settings, especially when it comes to CI/CD and multiple environments. From my experience, when you implement a bigger ADF project, you will probably have a Key Vault for your sensitive settings and all other values are provided during the deployment via ARM parameters.

So you will end up with a mix of Key Vault references and ARM template parameters which very likely will be derived from the Key Vault at some point anyway. To solve this, you can modify the JSON of an ADF linked service directly and inject KeyVault references into almost every property!

Click through to see how that works, as well as the ramifications.

Comments closed

Adding Tooltips to Column Headers in Power Query

Chris Webb has some fun with M:

How does this work? Here’s what each of the steps do:

– The source step creates a simple table with two columns called firstname and lastname using #table (see here for more details on that).
– The tabletype step declares a new table type with two columns (the same two columns in the table from the previous step) and then adds a metadata record to this type. In that record the Documentation.FieldDescription field contains the text values that will appear as tooltips when you hover over each column.
– The replacetype step replaces the type of the table returned by source with the type declared in tabletype.

Click through for the code.

Comments closed

Docker Compose and SQL Server

Andrew Pruski makes it easy to launch a fully-featured Docker container running SQL Server:

The solution here is to create a custom image with the volume created and permissions set.

But wouldn’t it be easier to just have to run one command to spin up a custom 2019 image, with volumes created and permissions set?

Enter Docker Compose.

Andrew has a GitHub repo with everything set up and includes plenty of screenshots to demonstrate.

Comments closed

Sparklyr 1.3 Released

Yitao Li announces sparklyr 1.3:

sparklyr 1.3 is now available on CRAN, with the following major new features:

Higher-order Functions to easily manipulate arrays and structs
– Support for Apache Avro, a row-oriented data serialization framework
Custom Serialization using R functions to read and write any data format
Other Improvements such as compatibility with EMR 6.0 & Spark 3.0, and initial support for Flint time series library

Between this and the work from the Spark side, we are seeing some nice quality of life improvements for Spark and R.

Comments closed

Case-Insensitive Searches in Cosmos DB

Hasan Savran gives us a few techniques for case-insensitive searching in Azure Cosmos DB:

Data gets saved as JSON in Azure Cosmos DB. JSON documents supports string, numeric and Boolean data types. Because of limited data types, you might end up picking string data type for most of your data. You may need to use Cosmos DB’s string functions to help you in many situations. For example; you might need to find names start with bunch of characters or you might need to find data ends like a word or bunch of characters. You may need to find all data that has address contains a street name. What do you do in these cases? Azure Cosmos DB has couple of system functions to help you, I will focus on the following functions in this post. These are the functions just got updated in June 2020.

Click through to learn a bit about those money-saving updates.

Comments closed

Comparing Integration Services and Azure Data Factory

Tim Mitchell compares SQL Server Integration Services to Azure Data Factory:

For the better part of 15 years, SQL Server Integration Services (SSIS) has been the go-to enterprise extract-transform-load (ETL) tool for shops running on Microsoft SQL Server. More recently, Microsoft added Azure Data Factory (ADF) to its stable of enterprise ETL tools. In this post, I’ll be comparing SSIS and Azure Data Factory to share how they are alike and how they differ. I’ll also review the strengths and shortcomings of each, including the architectures in which each of these is likely to do well.

Read on for Tim’s thoughts on the subject. Tim lays out his biases up-front but also gives you a good feel for where both products are in their lifecycles.

Comments closed

Getting Started with Redshift

Rahul Mehta has a primer on AWS Redshift:

AWS Redshift is a columnar data warehouse service on AWS cloud that can scale to petabytes of storage, and the infrastructure for hosting this warehouse is fully managed by AWS cloud. Redshift operates in a clustered model with a leader node, and multiple worked nodes, like any other clustered or distributed database models in general. It is based on Postgres, so it shares a lot of similarities with Postgres, including the query language, which is near identical to Structured Query Language (SQL). This Redshift supports creating almost all the major database objects like Databases, Tables, Views, and even Stored Procedures. In this article, we will explore how to create your first Redshift cluster on AWS and start operating it.

I’m not really the biggest fan of Redshift around, but Rahul does a good job walking us through the basics of the product.

Comments closed