Press "Enter" to skip to content

Author: Kevin Feasel

Unity Catalog in Azure Databricks

Paul Roome, et al, announce Unity Catalog:

We are excited to announce that data lineage for Unity Catalog, the unified governance solution for all data and AI assets on lakehouse, is now available in preview.

This blog will discuss the importance of data lineage, some of the common use cases, our vision for better data transparency and data understanding with data lineage, and a sneak peek into some of the data provenance and governance features we’re building.

Click through to see what it currently supports. My curious question is around whether this and Microsoft Purview will play nice in an Azure Databricks setup.

Comments closed

Column-Level Encryption in SQL Server

Tom Collins sets up column-level encryption:

How do I set up the SQL Server Column Level Encryption ? 

This is quite different from Always Encrypted—it’s the encryption process which we saw with SQL Server going back (at least) to 2000. It’s also (currently?) the only available way manually to encrypt columns in an Azure Synapse Analytics dedicated SQL pool. I’d say it’s not trivial to set up and use but neither is it a huge challenge.

Comments closed

Performance Concerns around GENERATE_SERIES()

Erik Darling looks the gift horse in the mouth:

Quite a while back, I blogged about how much I’d love to have this as a function. We… sort of got it. It doesn’t do dates natively, but you can work around some of that with date functions.

In this post, I want to go over some of the disappointing performance issues I found when testing this function out.

It’s not good news but it’s important to understand if you’re planning to use this when SQL Server 2022 comes out. Also, the things Erik mentions are easier to fix (potentially) than modifications in the signature of a function, as they’re internal. I wouldn’t guarantee that things will certainly be better but there are some good cases when we saw performance improvements between early CTPs and RTM.

Comments closed

Continuing a Dive into Simple Parameterization

Paul White shows how not-simple simple parameterization really is:

The output of parsing is a logical representation of the statement called a parse tree. This tree does not contain SQL language elements. It’s an abstraction of the logical elements of the original query specification.

It’s frequently possible to write the same logical requirement in different ways. Using a SQL analogy, x IN (4, 8) is logically the same as writing x = 4 OR x = 8. This flexibility can be useful when writing queries, but it makes implementing an optimizer more difficult.

In general terms, normalization is an attempt to standardize. It recognises common variations expressing the same logic, and rewrites them in a standard way. For example, normalization is responsible for turning x BETWEEN y AND z into x >= y AND x <= z. The normalized, or standardized, form chosen is one the query processor finds convenient to work with internally.

This has been a very interesting series and Paul does promise one more article.

Comments closed

Working with Azure VM Scale Sets

Arun Sirpal explains the benefit behind scale sets in Azure:

I really like scale sets. It lets you create and manage up to 1000 load balanced VMs per availability zone using windows or Linux images. (We can have flexible or uniforms modes for orchestration which dictates if you go down the homogenous VM route or a mix, where a mix is the flexible option.

There are many other benefits too apart from scaling, such as built-in load balancing options, increased resiliency via 3 Availability Zones and from a cost perspective you can couple scale sets with Azure Hybrid benefit or even use reserved instances – cost is important in the cloud!

Read the whole thing.

Comments closed

A Change Log for SQL Server ScriptDom

Arvind Shyamsundar keeps track of changes:

Till such time that we have a detailed, fully updated change log, this blog post is being written as an unofficial change log for ScriptDom at least. Hopefully it will help readers understand when certain T-SQL grammar was added to ScriptDom, etc. I hope to keep it updated as we have later releases of DacFx and thereby, ScriptDom. If you have questions / feedback for me, please do leave a comment in this blog post and I will try to address it in due course of time.

Note that if there are no functionality changes for a given DacFx release, it will not feature in this table.

And again, this is an unofficial change log, so it is provided as-is and should not be in any way construed as an official Microsoft statement.

Click through for this unofficial log. As you can see, ScriptDom is still under active development with SQL Server 2022.

Comments closed

Greatest and Least in T-SQL

Hasan Savran takes us through a rather useful pair of functions:

GREATEST and LEAST functions are available in Azure SQL and SQL Server 2022. GREATEST returns the maximum value from a list of expressions. LEAST returns the minimum value from a list of expressions. 

     The Highest number of expressions you can pass to these functions is 254. All expressions must be comparable for these functions to work. For example, SQL Server can automatically convert the following string to an integer and find the greatest number.

These kinds of row-wise comparisons cut out a lot of writing OR clauses. Though standard function rules apply: if you put these in the WHERE clause of queries against large tables and expect them to be your primary filters and you’ll probably be in for a nasty surprise.

Comments closed

Broken References in the SSIS Catalog

Andy Leonard beats the brakes off of some stuff:

It may help to define reference first. An SSIS Catalog Reference is a relationship between an SSIS Project and an SSIS Catalog Environment in the SSIS Catalog. One or more references to environments may be configured for a project. At execution time one and only one reference may be selected – like Highlander, “There can be only one.

broken reference occurs when one configures a reference between a project and an environment, and then the environment is deleted

Read on to see what this looks like in the SSIS Catalog Browser and SISS Catalog Compare tools, as well as how you can prevent accidentally creating broken references.

Comments closed

Implementing a Power BI Composite Model

Marc Lelijveld walks us through an implementation of the Power BI composite model:

Have you been working with Composite Models in Power BI? Did you run into challenges while you did? Then this blog is for you! In this blog I will further elaborate on what the composite models (including DirectQuery for Power BI datasets) are and everything that comes into play when you start implementing them.

During this blog I will introduce you to topics like source groups, storage modes and relationship evaluation. All different topics that come into play when you start building composite models.

If composite models sound interesting, be sure to check this out.

Comments closed