Press "Enter" to skip to content

Day: June 10, 2022

Unity Catalog in Azure Databricks

Paul Roome, et al, announce Unity Catalog:

We are excited to announce that data lineage for Unity Catalog, the unified governance solution for all data and AI assets on lakehouse, is now available in preview.

This blog will discuss the importance of data lineage, some of the common use cases, our vision for better data transparency and data understanding with data lineage, and a sneak peek into some of the data provenance and governance features we’re building.

Click through to see what it currently supports. My curious question is around whether this and Microsoft Purview will play nice in an Azure Databricks setup.

Comments closed

Column-Level Encryption in SQL Server

Tom Collins sets up column-level encryption:

How do I set up the SQL Server Column Level Encryption ? 

This is quite different from Always Encrypted—it’s the encryption process which we saw with SQL Server going back (at least) to 2000. It’s also (currently?) the only available way manually to encrypt columns in an Azure Synapse Analytics dedicated SQL pool. I’d say it’s not trivial to set up and use but neither is it a huge challenge.

Comments closed

Performance Concerns around GENERATE_SERIES()

Erik Darling looks the gift horse in the mouth:

Quite a while back, I blogged about how much I’d love to have this as a function. We… sort of got it. It doesn’t do dates natively, but you can work around some of that with date functions.

In this post, I want to go over some of the disappointing performance issues I found when testing this function out.

It’s not good news but it’s important to understand if you’re planning to use this when SQL Server 2022 comes out. Also, the things Erik mentions are easier to fix (potentially) than modifications in the signature of a function, as they’re internal. I wouldn’t guarantee that things will certainly be better but there are some good cases when we saw performance improvements between early CTPs and RTM.

Comments closed

Working with Azure VM Scale Sets

Arun Sirpal explains the benefit behind scale sets in Azure:

I really like scale sets. It lets you create and manage up to 1000 load balanced VMs per availability zone using windows or Linux images. (We can have flexible or uniforms modes for orchestration which dictates if you go down the homogenous VM route or a mix, where a mix is the flexible option.

There are many other benefits too apart from scaling, such as built-in load balancing options, increased resiliency via 3 Availability Zones and from a cost perspective you can couple scale sets with Azure Hybrid benefit or even use reserved instances – cost is important in the cloud!

Read the whole thing.

Comments closed

Continuing a Dive into Simple Parameterization

Paul White shows how not-simple simple parameterization really is:

The output of parsing is a logical representation of the statement called a parse tree. This tree does not contain SQL language elements. It’s an abstraction of the logical elements of the original query specification.

It’s frequently possible to write the same logical requirement in different ways. Using a SQL analogy, x IN (4, 8) is logically the same as writing x = 4 OR x = 8. This flexibility can be useful when writing queries, but it makes implementing an optimizer more difficult.

In general terms, normalization is an attempt to standardize. It recognises common variations expressing the same logic, and rewrites them in a standard way. For example, normalization is responsible for turning x BETWEEN y AND z into x >= y AND x <= z. The normalized, or standardized, form chosen is one the query processor finds convenient to work with internally.

This has been a very interesting series and Paul does promise one more article.

Comments closed