Press "Enter" to skip to content

Curated SQL Posts

SQL Server 2017 and Column-Level Encryption

Steve Jones notes a change between SQL Server 2016 and SQL Server 2017 around column-level encryption:

I discovered recently that there was a change made in SQL Server 2017 to the way that symmetric key passphrases are hashed. There’s a KB article that notes the fix, but basically the passphrases used to be encrypted with SHA1. That’s cryptographically insecure, so the algorithm was updated to SHA2.

This is a problem, and can cause some issues. I’ll show the issue and then how to get around it.

There’s not much detail in the KB article about what happens afterward: if you need to keep the trace flag on forever or if it eventually migrates everything over to using SHA2 for hashes.

Comments closed

Computed Columns and Temporal Tables

Randolph West shows us how to include computed columns in temporal tables:

As I say in my Back to the future with Temporal Tables session, there are several limitations we should be aware of with this feature. One I had not previously considered is that computed columns are not permitted in a history table. This is a problem if we are converting an existing auditing process to make use of temporal tables, and we have computed columns in our table.

Computed columns are columns that are created from an expression based on one or more existing columns in a table. They are useful if we need to index a portion of a wider column, for example.

Hat tip to Erik Darling for coming up with the solution.

Comments closed

Calculating the Pain of UDFs

Taiob Ali points out something added to SQL Server 2017 (and later 2016 and 2014):

Microsoft SQL Server Management Studio (SSMS) version 17.5 added new showplan attributes UdfCpuTime and UdfElapsedTime to QueryTimeStats. These two attributes will measure the time and CPU spent on user-defined functions within a query execution hence helping to discover the impact of UDF execution within full query execution. This feature was first added in SQL Server 2017 CU3 and was backported to SQL Server 2016 SP2. Finding the execution time and CPU for UDF was always a challenge for Data professionals because the number of times a function will execute will vary.

This was a blind spot for a very long time.

Comments closed

Choosing a SQL Platform on Hadoop

Sagar Kewalramani walks us through the choices for SQL platforms on the Cloudera Data Platform:

CDW on CDP is a new service that enables you to create a self-service data warehouse for teams of Business Intelligence (BI) analysts.  You can quickly provision a new data warehouse and share any data set with a specific team or detpartment. Do you remember when you could provision a data warehouse on your own?  Without infrastructure and platform teams getting involved? This was never possible.  CDW fulfills this mission.  

However, CDW makes several SQL engines available, and with more choice comes more opportunities for confusion.   Let’s explore the SQL engines available in CDW on CDP and talk about which is the right SQL option for the right use case.

So many choices!  Impala? Hive LLAP?  Spark? What to use when?  Let’s explore.

Infrastructure and platform teams start to get involved approximately two days after the unexpectedly large bill arrives.

That aside, this is a really nice article covering several platform technologies, including Impala, Hive LLAP, and Spark SQL.

Comments closed

Parquet Versus Avro

Matthew Rathbone compares the Parquet and Avro file formats:

JSON improves upon CSV as each row provides some indication of schema, but without a special header-row, there’s no way to derive a schema for every record in the file, and it isn’t always clear what type a ‘null’ value should be interpreted as.

Avro and Parquet on the other hand understand the schema of the data they store. When you write a file in these formats, you need to specify your schema. When you read the file back, it tells you the schema of the data stored within. This is super useful for a framework like Spark, which can use this information to give you a fully formed data-frame with minimal effort.

I was kind of hoping to see ORC in the comparison as well, though even when the Hortonworks-Cloudera competition was at its max, my recollection is that the differences between the two formats were pretty small (where ORC was a little faster for non-nested data and Parquet a little faster for nested data).

Comments closed

From SQL Server to Cassandra

Shel Burkow has started a new series:

A subset of related tables in a relational schema can satisfy any number of queries known and unknown at design time. Refactoring the schema into one Cassandra table to answer a specific query, though, will (re)introduce all the data redundancies the original design had sought to avoid.

In this series, I’ll do just that. Starting from a normalized SQL Server design and statement of the Cassandra query, I’ll develop four possible solutions in both logical and physical models. To get there, though, I’ll first lay the foundation.

This initial article focuses on the Cassandra primary key. There are significant differences from those in relational systems, and I’ll cover it in some depth. Each solution (Part III) will have a different key.

Cassandra (as well as Riak, while that was still a thing people cared about) has the concept of tables and SQL statements to work with them, but it’s quite different from a relational database, different enough that new design patterns are necessary. Just about the worst thing you could do would be to drop your relational database schema in Cassandra and call it a day.

Comments closed

Formatting SSIS Packages

Tim Mitchell shows us different options available when formatting SSIS packages:

Most folks I know go to one extreme or the other when it comes to the visual layout of SQL Server Integration Services packages: either they don’t care a tiny bit about the appearance, or they insist on an easy-to-read layout before a project is considered complete. I am definitely in the second group, and will almost always spend the time to make sure my packages are properly laid out.

Regardless of which of these groups you are in, it is very easy to apply some simple visual formatting to your packages. In this quick tip, I’ll show you how to use the visual formatting options in SSIS to automate the layout.

I consider it quite important—it gives a future viewer an idea of your vision as developer. If I can’t read what you’re doing, that does not bode well.

Comments closed

Automating JSON to Tables

Dave Mason has a stored procedure for us:

The code for my stored procedure is below. It has a single input parameter for a string of JSON data. The name/value pairs of the first “row” of JSON data is parsed to obtain column names and types for the result set. A query string for the OPENJSON function is constructed and executed via EXEC. JSON’s support for data types is pretty sparse compared to SQL Server (notice the big CASE expression where I attempt to do some mapping). If you’re using SQL 2016, you’ll have to make an edit for the STRING_AGG function.

Click through for the code and demos.

Comments closed

Deferred Compilation and Compatibility Level 140 Query Hints

Milos Radivojevic shows that table-valued parameters do not care about your QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140 settings:

The plan is created under CL 140, but the estimation number of rows for the table variable is not 1 but the actual one. Even if you would specify the FORCE_LEGACY_CARDINALITY_ESTIMATION hint, the query will be deferred compiled and behavior of table variable would be the same.

Table variable deferred compilation respects settings at the database scope and ignores all hints except one.

Click through to learn what that one hint is.

Comments closed

Multi-Column Transformations with Power Query

Imke Feldmann shows how we can apply a function to multiple columns at once using Power Query:

You can apply simple transformations to multiple columns at once in Power Query using the UI only. In this article I show how you can apply advanced transformations on multiple columns at once instead. You can also use this to use custom functions instead. And lastly for the lazyefficient fans of custom M-functions: You will get a new “TranformAllMyColumnsAtOnceHowILikeIt”-function as well

Read on for a few examples, including one using custom code.

Comments closed