June 2022 – Page 11 – Curated SQL

In this post, I will walk you through how to set up MinIO, so you can use it to work with SQL Server 2022’s s3 object integrations. Working with s3 and SQL Server requires a valid and trusted TLS certificate. This can be a pain for some users and environments. So I’m writing this post so you can get off the ground running with this new feature set in SQL Server 2022. The certificate we’re working with here is self-signed. You could get a real certificate for your environment, and that’s encouraged. But this walk-through intends to get you up and running fast so that you can test out SQL Server’s s3 object integrations. We’re using MinIO’s free GNU AGPL v3 edition and running it in a docker container for our s3 compatible object storage and SQL Server 2022 CTP 2.0, which is also running in a container.

Click through for the demo, in which Anthony sets everything up and then backs up a database in SQL Server 2022 to MinIO.

Comments closed

Unity Catalog in Azure Databricks

Published 2022-06-10 by Kevin Feasel

Paul Roome, et al, announce Unity Catalog:

We are excited to announce that data lineage for Unity Catalog, the unified governance solution for all data and AI assets on lakehouse, is now available in preview.
This blog will discuss the importance of data lineage, some of the common use cases, our vision for better data transparency and data understanding with data lineage, and a sneak peek into some of the data provenance and governance features we’re building.

Click through to see what it currently supports. My curious question is around whether this and Microsoft Purview will play nice in an Azure Databricks setup.

Comments closed

Column-Level Encryption in SQL Server

Published 2022-06-10 by Kevin Feasel

Tom Collins sets up column-level encryption:

How do I set up the SQL Server Column Level Encryption ?

This is quite different from Always Encrypted—it’s the encryption process which we saw with SQL Server going back (at least) to 2000. It’s also (currently?) the only available way manually to encrypt columns in an Azure Synapse Analytics dedicated SQL pool. I’d say it’s not trivial to set up and use but neither is it a huge challenge.

Comments closed

Performance Concerns around GENERATE_SERIES()

Published 2022-06-10 by Kevin Feasel

Erik Darling looks the gift horse in the mouth:

Quite a while back, I blogged about how much I’d love to have this as a function. We… sort of got it. It doesn’t do dates natively, but you can work around some of that with date functions.
In this post, I want to go over some of the disappointing performance issues I found when testing this function out.

It’s not good news but it’s important to understand if you’re planning to use this when SQL Server 2022 comes out. Also, the things Erik mentions are easier to fix (potentially) than modifications in the signature of a function, as they’re internal. I wouldn’t guarantee that things will certainly be better but there are some good cases when we saw performance improvements between early CTPs and RTM.

Comments closed

Working with Azure VM Scale Sets

Published 2022-06-10 by Kevin Feasel

Arun Sirpal explains the benefit behind scale sets in Azure:

I really like scale sets. It lets you create and manage up to 1000 load balanced VMs per availability zone using windows or Linux images. (We can have flexible or uniforms modes for orchestration which dictates if you go down the homogenous VM route or a mix, where a mix is the flexible option.
There are many other benefits too apart from scaling, such as built-in load balancing options, increased resiliency via 3 Availability Zones and from a cost perspective you can couple scale sets with Azure Hybrid benefit or even use reserved instances – cost is important in the cloud!

Read the whole thing.

Comments closed

Continuing a Dive into Simple Parameterization

Published 2022-06-10 by Kevin Feasel

Paul White shows how not-simple simple parameterization really is:

The output of parsing is a logical representation of the statement called a parse tree. This tree does not contain SQL language elements. It’s an abstraction of the logical elements of the original query specification.
It’s frequently possible to write the same logical requirement in different ways. Using a SQL analogy, x IN (4, 8) is logically the same as writing x = 4 OR x = 8. This flexibility can be useful when writing queries, but it makes implementing an optimizer more difficult.
In general terms, normalization is an attempt to standardize. It recognises common variations expressing the same logic, and rewrites them in a standard way. For example, normalization is responsible for turning x BETWEEN y AND z into x >= y AND x <= z. The normalized, or standardized, form chosen is one the query processor finds convenient to work with internally.

This has been a very interesting series and Paul does promise one more article.

Comments closed

Calculating a Cumulative Sum (Running Total) in R

Published 2022-06-09 by Kevin Feasel

Jim calculates running totals in R:

Cumulative Sum calculation in R, using the dplyr package in R, you can calculate the cumulative sum of a column using the following methods.

Click through for examples of ungrouped and grouped operations. H/R R-Bloggers

Comments closed

A Change Log for SQL Server ScriptDom

Published 2022-06-09 by Kevin Feasel

Arvind Shyamsundar keeps track of changes:

Till such time that we have a detailed, fully updated change log, this blog post is being written as an unofficial change log for ScriptDom at least. Hopefully it will help readers understand when certain T-SQL grammar was added to ScriptDom, etc. I hope to keep it updated as we have later releases of DacFx and thereby, ScriptDom. If you have questions / feedback for me, please do leave a comment in this blog post and I will try to address it in due course of time.
Note that if there are no functionality changes for a given DacFx release, it will not feature in this table.
And again, this is an unofficial change log, so it is provided as-is and should not be in any way construed as an official Microsoft statement.

Click through for this unofficial log. As you can see, ScriptDom is still under active development with SQL Server 2022.

Comments closed

Broken References in the SSIS Catalog

Published 2022-06-09 by Kevin Feasel

Andy Leonard beats the brakes off of some stuff:

It may help to define reference first. An SSIS Catalog Reference is a relationship between an SSIS Project and an SSIS Catalog Environment in the SSIS Catalog. One or more references to environments may be configured for a project. At execution time one and only one reference may be selected – like Highlander, “There can be only one.”
A broken reference occurs when one configures a reference between a project and an environment, and then the environment is deleted.

Read on to see what this looks like in the SSIS Catalog Browser and SISS Catalog Compare tools, as well as how you can prevent accidentally creating broken references.

Comments closed

Greatest and Least in T-SQL

Published 2022-06-09 by Kevin Feasel

Hasan Savran takes us through a rather useful pair of functions:

GREATEST and LEAST functions are available in Azure SQL and SQL Server 2022. GREATEST returns the maximum value from a list of expressions. LEAST returns the minimum value from a list of expressions.
The Highest number of expressions you can pass to these functions is 254. All expressions must be comparable for these functions to work. For example, SQL Server can automatically convert the following string to an integer and find the greatest number.

These kinds of row-wise comparisons cut out a lot of writing OR clauses. Though standard function rules apply: if you put these in the WHERE clause of queries against large tables and expect them to be your primary filters and you’ll probably be in for a nasty surprise.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Month: June 2022

Using S3 Object Storage in MinIO with SQL Server 2022

Unity Catalog in Azure Databricks

Column-Level Encryption in SQL Server

Performance Concerns around GENERATE_SERIES()

Working with Azure VM Scale Sets

Continuing a Dive into Simple Parameterization

Calculating a Cumulative Sum (Running Total) in R

A Change Log for SQL Server ScriptDom

Broken References in the SSIS Catalog

Greatest and Least in T-SQL