Press "Enter" to skip to content

Curated SQL Posts

Using sp_prepare with Plan Guides

Aaron Bertrand tries something different:

There are features many of us shy away from, like cursors, triggers, and dynamic SQL. There is no question they each have their use cases, but when we see a trigger with a cursor inside dynamic SQL, it can make us cringe (triple whammy).

Plan guides and sp_prepare are in a similar boat: if you saw me using one of them, you’d raise an eyebrow; if you saw me using them together, you’d probably check my temperature. But, as with cursors, triggers, and dynamic SQL, they have their use cases. And I recently came across a scenario where using them together was beneficial.

Read on to see the method to this madness.

Comments closed

Little Things in Azure Data Factory

Rayis Imayev has some kind words about small niceties in Azure Data Factory:

Recently Microsoft team conducted a brief year-end survey about a “one thing” that Azure Data Factory (ADF) “made your day in 2020” – https://twitter.com/weehyong/status/1343709921104183296. There were different responses from the global parameters support to the limit increase of ADF instances per subscription.

I personally like the little things that are not easily detected on a surface, but with a deeper immersion into a data pipeline development, your level of gratefulness increases even more.

Click through for a few examples.

Comments closed

Security Update for SQL Server

Randolph West takes a look at a patch:

Microsoft announced updates today for all supported versions of SQL Server, for a privilege escalation vulnerability that leverages Extended Events. For security reasons no further details have been provided, but you can expect more information in the near future, now that this update is public.

Be sure to grab the latest update for your version of SQL Server.

Comments closed

Hyperparameter Tuning as Technical Debt

John Mount has an interesting take on hyperparameter tuning:

The hyper dance is the venial trick of pushing user facing technical debt and flaws as user controllable features. These controls are usually named “hyper parameters” and they are parameters or arguments that control the behavior of an algorithm. Users think “hyper parameters” must be even better than “regular parameters”, just like “hyper drive” is better than “sub-light drive.” However the etymology of the name isn’t from science fiction, it is just the need in statistical contexts to have a name for controls other than parameter, as parameter is often used to name the fit coefficients of a model (i.e. to name an output, not an input!).

In addition to this, I’d be concerned that heavy hyperparameter tuning could lead to a garden of forking paths problem where we end up accidentally doing the equivalent of p-hacking: modifying hyperparameters until we come up with the “right” answer.

Comments closed

Improving a Graph

Elizabeth Ricks has started a series on improving a particular visual:

I empathize with the plight of this anonymous creator. In previous roles, I frequently created visuals that looked like this, and was left frustrated when requests came back for “more data.” I slowly came to realize that I was assigning my audience the tedious task of figuring out for themselves what the takeaways were. My visuals should have been highlighting the interesting things to those seeing them for the first time. The five questions we’ll be discussing in this series will help us to do just that.

The first question in the series is, “What elements can I eliminate?” I think that’s a really good idea—with data visualization, less is more.

Comments closed

Archival on Delete in SQL Server

Erik Darling shows off a pattern:

Well, friends, I have good news for you. This is an easy one to implement.

Let’s say that in Stack Overflow land, when a user deletes their account we also delete all their votes. That’s not how it works, but it’s how I’m going to show you how to condense what can normally be a difficult process to isolate into a single operation.

The one gripe I have with this post is that my annoyingly loud keyboard is buckling spring, not Cherry MX Blue, thank-you-very-much.

Comments closed

Azure Data Factory and Source Control

Ahmad Yaseen shows how you can save Azure Data Factory pipelines in source control:

To overcome these limitations, Azure Data Factory provides us with the ability to integrate with a GIT repository, such as Azure DevOps or GitHub repository, that helps in tracking and versioning the pipelines changes, and incrementally save the pipeline changes during the development stage, without the need to validate the incomplete pipeline, preventing these changes from being lost in case of any crash or failure. In this case, you will be able to test the pipeline, revert any change that is detected as a bug, and publish the pipeline to the Data Factory when everything is developed and validated successfully.

Click through for the setup instructions.

Comments closed

Query Store and Cross-Database Queries

Matthew McGiffen does some research:

When I was writing the script shared in my last post Identify the (Top 20) most expensive queries across your SQL Server using Query Store a question crossed my mind:

Query Store is a configuration that is enabled per database, and the plans and stats for queries executed in that database are stored in the database itself. So what does query store do when a query spans more than one database?

Read on for the answer.

Comments closed

DataFrame Cleaning in Spark

Craig Covey has an update to the Spark Starter Guide:

Real-world datasets are hardly ever clean and pristine. They commonly include blanks, nulls, duplicates, errors, malformed text, mismatched data types, and a host of other problems that degrade data quality. No matter how much data one might have, a small amount of high quality data is more beneficial than a large amount of garbage data. All decisions derived from data will be better with higher quality data. 

In this section we will introduce some of the methods and techniques that Spark offers for dealing with “dirty data”. The term dirty data means data that needs to be improved so the decisions made from the data will be more accurate. The topic of dirty data and how to deal with it is a very broad topic with a lot of things to consider. This chapter intends to introduce the problem, show Spark techniques, and educate the user on the effects of “fixing” dirty data. 

It’s interesting to see what’s available in Spark and how you can take advantage of it.

Comments closed