Press "Enter" to skip to content

Curated SQL Posts

Data Quality Monitoring with SQL

Ryan Kearns and Barr Moses walk us through key principles for monitoring data quality in a relational database:

Next, we want to assess the field-level, distributional health of our data. Distribution tells us all of the expected values of our data, as well as how frequently each value occurs. One of the simplest questions is, “how often is my data NULL”? In many cases, some level of incomplete data is acceptable — but if a 10% null rate turns into 90%, we’ll want to know.

This covers a couple examples around data freshness and completeness, and I appreciate the level of detail in here. Nothing is earth-shattering, but at the same time, it’s important to have a catalog of the sorts of issues which can pop up. H/T Mark Hutchinson.

Comments closed

Updates to SQL Server Big Data Clusters

Rahul Ajmera fills us in on what they’ve been doing with SQL Server Big Data Clusters:

Today, we’re announcing the release of the latest cumulative update (CU9) for SQL Server Big Data Clusters, which includes important capabilities:

– Support to configure BDC post deployment.
– Improved experience for encryption at rest.
– Ability to install Python packages at Spark job submission time.
– Upgraded software versions for most of our OSS components (Grafana, Kibana, FluentBit, etc.) to ensure Big Data Clusters images are up to date with the latest enhancements and fixes.
– Miscellaneous improvements and bug fixes.

This announcement highlights some of the major improvements, provides additional context to better understand the design behind these capabilities, and points you to relevant resources to learn more and get started.

Click through for more detail on a few of the items.

Comments closed

SPN Registration and dbatools

Jess Pomfret takes us through some SPN pains:

But instead of getting a quick answer to my question, I just got the following error:

WARNING: [15:19:49][Get-DbaDatabase] Error occurred while establishing connection to dscsvr1 | The target principal name is incorrect. Cannot generate SSPI context.

Just reading the article brought back some bad troubleshooting memories for me… But as usual, I’m impressed that dbatools has a cmdlet or two to help with that troubleshooting.

Comments closed

Memory-Optimized Table Variables and tempdb Contention

Erik Darling notes that memory-optimized table variables can be useful in specific circumstances:

First, yes, they do help relieve tempdb contention if you have code that executes under both high concurrency and frequency. And by high, I mean REALLY HIGH.

Like, Snoop Dogg high.

Because you can’t get rid of in memory stuff, I’m creating a separate database to test in.

Been there. When tempdb object creation causes massive contention, this certainly alleviates the stress.

As Erik notes, there are some tradeoffs to this, meaning that you have a real decision to make rather than simply using memory-optimized user-defined table types as a starting point.

Comments closed

Azure Data Factory and JSON Array Hand-Offs

Rayis Imayev wants to pass a JSON array from one Azure Data Factory pipeline to another:

This next post came out of an error message during my attempt to pass a hard-coded array value between pipelines. Strangely, this use-case worked well in the pipeline that was already deployed in ADF, however, I was getting an error message while trying to test and execute this very same pipeline in a Debug mode.

Click through for the explanation.

Comments closed

Renaming All Column Names on All Tables in One Power Query Statement

Soheil Bakhshi has achieved mass production:

previously wrote a blog post explaining how to rename all columns in a table in one go with Power Query. One of my visitors raised a question in the comments about the possibility to rename all columns from all tables in one go. Interestingly enough, one of my customers had a similar requirement. So I thought it is good to write a Quick Tip explaining how to meet the requirement.

Click through to see how to build an expression which iterates over all columns in all tables.

Comments closed

Non-Equi Joins in R

David Selby walks us through non-trivial join scenarios in R:

Most joins are equi-joins, matching rows according to two columns having exactly equal values. These are easy to perfom in R using the base merge() function, the various join() functions in dplyr and the X[i] syntax of data.table.

But sometimes we need non-equi joins or θ-joins, where the matching condition is an interval or a set of inequalities. Other situations call for a rolling join, used to link records according to their proximity in a time sequence.

How do you perform non-equi joins and rolling joins in R?

Click through for the answer using dplyr, sqldf, and data.table. H/T R-bloggers

Comments closed

Synchronizing Metadata between Spark Tables and Serverless Pool

Charl Roux takes us through one back-end integration mechanism between tables in Azure Synapse Analytics Spark pools and serverless SQL pool:

Synapse provides an exciting feature which allows you to sync Spark database objects to Serverless pools and to query these objects without the Spark pool being active or running.  Synapse workspaces are accessed exclusively through an Azure AD Account and objects are created within this context in the Spark pool. In some scenarios I would like to share the data which I’ve created in my Spark database with other users for reporting or analysis purposes. This is possible with Serverless and in this article I will show you how to complete the required steps from creation of the object to successful execution. 

Click through for the demonstration.

Comments closed

Monitoring SSAS with Quest Spotlight

Slava Murygin has two questions and two answers:

This post is just answering two simple questions:

1. Can Quest Software’s Spotlight successfully monitor SQL Server Analysis Server?

2. If it can, what SSAS parameters, databases’ and cubes’ details it monitors and provides information about?

First, it’s good to see Slava back in the saddle again. Second, click through for those answers. Slava also promises to check out some other SSAS monitoring tools, so stay tuned.

Comments closed

Visualizing a Power BI Refresh

Phil Seamark has a dashboard which will help understand Power BI dataset refresh times:

Have you ever wondered why a Power BI dataset refresh was taking so long? And more specifically, how much time did the refresh spend on various sub-tasks that aren’t that visible to you via the web-portal?

This article shares a technique you can use to capture events fired during a Power BI refresh and use the results in a Power BI report visualise the results. Have to love the idea of using Power BI to optimise and improve Power BI. 

Click through to get the Power BI report and get step-by-step instructions on how to use it.

Comments closed