Press "Enter" to skip to content

Curated SQL Posts

Azure Purview: External Connections and the Starter Kit

Wolfgang Strasser continues a series on Azure Purview. First up is a look at external connections:

During my tests with Azure Purview I found out, that in my demo accounts (I created multiple to test it) the Management menu was missing some items.

Read on to learn why and how you can rectify this. Then, check out Wolfgang’s take on the Starter Kit:

Very often, when I want to test some new services I miss some infrastructure and environments I can start and play with. I am not talking about creating a new Azure Purview account (see my previous blog post – Creating an Azure Purview account) – I am talking about the data infrastructure to analyze, catalog and gain knowledge out of it.

We could start to create such an infrastructure, BUT: the Purview team create a Starter Kit to quickly create a data estate and configure everything that you can start with Purview within a view minutes.

Read on for one issue (of the self-inflicted variety) Wolfgang ran into during deployment. But it does look like a great way to get started with Purview and build up a relevant demo environment.

Comments closed

Create the Fastest Number Series Generator

Itzik Ben-Gan has a challenge for us:

I always used my solution with a base table value constructor cardinality of 2, but Marcos’ comment made me think. This tool is so useful that we as a community should join forces to try and create the fastest version that we possibly can. Testing different base table cardinalities is just one dimension to try. There could be many others. I’ll present the performance tests that I’ve done with my solution. I mainly experimented with different table value constructor cardinalities, with serial versus parallel processing, and with row mode versus batch mode processing. However, it could be that an entirely different solution is even faster than my best version. So, the challenge is on! I’m calling all jedi, padawan, wizard and apprentice alike. What’s the best performing solution that you can conjure? Do you have it within you to beat the fastest solution posted thus far? If so, share yours as a comment to this article, and feel free to improve any solution posted by others.

Give it a try.

Comments closed

First Thoughts on Amazon Babelfish

Ryan Booz shares some first thoughts on Amazon’s Babelfish offering:

The imputes for creating the tool is clear for AWS. Provide a way for customers to easily connect a SQL Server app to Aurora Postgres, saving big on licensing fees and reducing total cost of ownership. Assuming the tool is successful at some level, I’m sure it will provide a revenue boost for Amazon and some customers might (initially) feel a win. No harm, no foul on Amazon for leading the effort. Free markets, baby!

No matter how clever Babelfish is, however, I just can’t see how this is ultimately a win for SQL Server or PostgreSQL… or the developers that will ultimately need to support these “hybrid” apps.

I think Ryan makes good points and does hit upon the crux of the problem. I’d also say that there’s a secondary problem which Ryan hints at, but it is that a query may be sufficiently fast in one database variant but perform horridly in another. A classic example of this is a solution built on cursors in Oracle and then bringing that to T-SQL.

Comments closed

Switching Between Dates with Calculation Groups

Alberto Ferrari has another good use for calculation groups:

This technique works just fine; it has the disadvantage of creating many measures, one for each combination of relationship to activate and base measure. Another solution is to create a calculation group that changes the active relationship of the selected measure. Doing this, you create one calculation item for each relationship and the user chooses the relationship to activate using a slicer or a report filter.

But read on for the calculation group solution, which is a clever way of deferring which relationship you care about until the user selects it.

Comments closed

Execution Plans: Foreign Keys and Deletion

Hugo Kornelis continues a series on execution plans:

In the previous part, we looked at foreign key checking when data is inserted. Inserts in the referencing (child) table can, by definition, not cause foreign key violations, so no checking is done in those cases. But for inserts in the referenced (parent) table, SQL Server needs to check that the inserted value does indeed exist. We saw how both the Nested Loops and the Merge Join operators have some specific optimizations to do that check as efficient as possible; we also saw how the Assert operator does the actual checking.

When deleting data, the reverse is true. We can delete rows from the child table without verification, but deletes from the parent table are subject to a check.

And those deletes can get crazy expensive.

Comments closed

Powershell Splatting vs Backticks

John McCormack on backticks:

My code was full of backticks. You could say I was daft about backticks. I loved them and thought they made my PowerShell code readable because they stopped the lines from running off the monitor. Someone asked me why I don’t use PowerShell splatting? “Whatting” I asked? I had never heard of splatting.

Click through to learn more.

Comments closed

Provisioning Storage for Azure SQL Edge

Andrew Pruski continues a series on running Azure SQL Edge on a cluster of Raspberry Pis:

In a previous post we went through how to setup a Kubernetes cluster on Raspberry Pis and then deploy Azure SQL Edge to it.

In this post I want to go through how to configure a NFS server so that we can use that to provision persistent volumes in the Kubernetes cluster.

There are a fair number of steps, but Andrew does a good job taking us through them.

Comments closed

Sampling the Population of SQL Server Instances

Brent Ozar has a breakdown of SQL Server adoption by version:

Ever wonder how fast people are adopting new versions of SQL Server, or what’s “normal” out there for SQL Server adoption rates, hardware sizes, or numbers of databases? Let’s find out in the summer 2020 version of our SQL ConstantCare® population report.

Out of the 3,650 servers sending in data recently, the most popular version of SQL Server is still 2016. 

It’s an interesting survey, but it’s important to remember with any survey what the biases are before drawing conclusions on the general population of all SQL Server instances. This is among ConstantCare users, so if I had a hunch about them, I’d say they’re going to blow out the middle versus gen pop—in other words, I’d expect 2019 and 2017 usage to be a bit higher across the world (and not just among ConstantCare users), but also 2005 (not pictured), 2008, and 2008 R2 to be higher. My hunch here is that the former are shops which have their act together and figure that they don’t need ConstantCare, and the latter are all of those “hidden” instances of SQL Server acting as the back end for small business websites and the like, where nobody even thinks about having a database administrator, much less paying for tooling. Nonetheless, these are interesting trends and even if I think the general population might look different, my data-less guess is that these numbers look reasonable.

Comments closed

Ignoring Bad Dates when Moving to Spark 3

Robert Blackburn shows us one way to handle bad dates when moving to Spark 3:

Moving from a Spark 2 to a Spark 3 runtime has a lot of benefits including big performance improvements through adaptive query executiondynamic partition pruning, and other optimizations. Some updates may require you to refactor your code. One of them is Delta tables now use the Proleptic Gregorian Calendar. Isn’t a calendar a calendar? Unfortunately, no. The Julian calendar has discrepancies with old dates. Specifically dates before 1582 and timestamps before 1900. Here we will dynamically update these dates for incoming source files.

If you would like to follow along in detail, I have a sample notebook that uses the community edition of Databricks. The DBC Archive file is here and the source file is here.

Fortunately, this change is unlikely to affect most of us, with perhaps the most common issue being that you used 0001-01-01 as a default date.

Comments closed