Press "Enter" to skip to content

Curated SQL Posts

Stored Procedures and Headers

Andy Brownsword lays out an argument:

Code is an ever moving target. Version control and documentation only go so far, if they even exist. Sometimes all you have is the code in front of you.

This is why I always start stored procedures with a header.

There was a time I strongly resisted this idea, but if you are diligent about keeping this up to date, it can be very useful for record-keeping, especially if your company has a tendency to switch source control systems and not keep the history between moves.

Leave a Comment

Using Python in R in Excel

Adam Gladstone wraps up a series on an R add-in for Excel:

In the last post in this series I am going to look at calling Python from R. Even though Excel now provides a means of calling Python scripts directly, using the =PY() formula in a worksheet, there are still occasions when it is beneficial to call Python via R. For example, it turns out that importing yfinance produces a ‘module not found’ error using Excel’s function. According to the documentation, yfinance is not one of the open source libraries that the Excel Python secure distribution supports. To get around this issue, we can use the R package Reticulate. This lets us load and run Python scripts from R. As we have seen in the previous parts of this series, the ExcelRAddIn allows us to run R scripts from an Excel worksheet. And putting these two together is quite simple.

I’m glad Adam mentioned this because my first question was going to be, why use this when Excel has Python capabilities built in? And that’s a reasonable answer.

Leave a Comment

DBAs and Data Access

Brent Ozar wraps up a survey:

Last week, I asked if your database administrators could read all of the data in all databases. The results (which may be different from this post, because I’m writing the post ahead of time and the poll is still open):

In a lot of cases, this doesn’t really matter much. In places where it does matter (for example, reading protected health information or critical financial data), there should be controls in place. I’ve always been on the side of this issue that says that yes, you do need to be able to trust your administrators at the end of the day, because somebody’s going to need a way to get to that data in case of company emergency. But as a company grows and there are additional opportunities for division of labor and specialization, you do open up the possibility of stronger controls, proper auditing, limiting certain data access to privileged accounts, and consequences for violating the rules.

Leave a Comment

Three Years of PSBlitz

Vlad Drumea looks back:

In case you’re not familiar with it, PSBlitz is a PowerShell script that outputs SQL Server health and performance diagnostics data to either Excel or HTML, and saves execution plans and deadlock graphs as .sqlplan and .xdl files.

It’s based on modified non-stored procedure versions of Brent Ozar’s SQL Server First Responder Kit and Erik Darling’s sp_QuickieStore.
Along with some other custom scripts I’ve put together.

You can get more details about its capabilities and usage from my PSBlitz GitHub repo.

Read on to learn about the origin story of PSBlitz, some of the major changes that have taken place over the past three years, and a bit more on the tool.

Leave a Comment

Randomization of Personally Identifiable Information

Rich Benner tries out a couple of techniques:

The main issue we see in dev environments is that people take a nice little version of their database, a few hundred rows of data per table, and develop on that. This is great for checking that your logic is correct, but not good when it comes to actually deploying the code to production. Suddenly, your nice, pretty code has to deal with millions of rows of data and grinds to a halt because you didn’t write it for big data sets. No, you wrote it for your development system, and it was fast on your machine. We see this a lot.

Rich shows a couple of techniques for data randomization. My biggest challenge to this is that if you need a proper distribution of data, you lose it. Using the telephone number example, if you have lookups or data analysis by area code, randomly generating across every area code would be bad. Also, if your application is smart enough to deal with valid or invalid area codes and exchanges (the middle three digits of the three-three-four phone number style in the US), you generating arbitrary area codes or exchanges might prevent app developers from using the application in the proper way, perhaps requiring them to fix phone numbers after viewing a data entry screen.

In short, there are easier and harder ways to do this, and several factors may push you into the harder way.

Leave a Comment

Column Expansion and Compression

Ed Pollack goes big or goes home:

One of the most hassle-prone changes that can be made in a SQL Server database is to widen an existing column. Depending on the column’s use and data type, this process may involve hundreds of schema changes, app downtime, and a mountain of risk-prone code to review and deploy. 

This article dives into a fun (and interesting!) strategy for widening fixed-width columns in SQL Server, to reduce downtime, risk, and runtime at the time when a column’s data type needs to be changed. 

If you have ever suffered through a lengthy INT to BIGINT release, then this guide is for you! 

Read on for a step by step guide on expanding a column with minimum downtime, as well as a quick description of row- and page-level compression and how you can use those to ensure you’re using as few bytes as necessary when storing a smaller number in a larger data type.

Leave a Comment

Microsoft Fabric Features for July 2025

Patrick LeBlanc has an update for us:

Welcome to the July 2025 Fabric Feature Summary! This month’s update covers major events like the return of the Microsoft Fabric Community Conference in Vienna, and the 10th anniversary of Power BI. Key platform enhancements include new domain tags and updates to the default category in the OneLake catalog. You’ll also find highlights on data science developments, such as Fabric data agent integration with Microsoft Copilot Studio. Explore the innovations shaping the future of Fabric in this month’s edition.

Read on for a long list of new and updated features.

Leave a Comment

Using R for Forecasting in Excel

Adam Gladstone continues a series on using R in Excel:

We have already seen how to obtain descriptive statistics in Part I and how to use lm() in Part II. In this part (Part III) of the series we will look at using R in Excel to perform forecasting and time series analysis.

In the previous two parts we have seen different ways to handle the output from R function calls, unpacking and massaging the data as required. In this part we are going to focus on setting up and interacting with a number of models in the ‘forecast’ package (fpp2).

Read on for the demo. This is getting into territory that is by no means trivial to do natively in Excel.

Leave a Comment

Building a Snowflake Dashboard that Uses Filters

Kevin Wilkie does a bit of filtering:

Snowflake Dashboards can do a lot more than just show pretty numbers. Today, let’s focus on something that every data pro eventually has to deal with—filters that make navigating your dashboards less painful, especially when it comes to everyone’s favorite task: AUDITING.

Ah yes, auditing—because nothing says “data dream job” like tracing permissions. Whether it’s quarterly compliance checks or a sudden request from an overly curious auditor, somebody, at some point, will ask, “Who has access to what in Snowflake?” So let’s make that answer easy to deliver.

Click through for the process, using the development of a permissions auditing dashboard as the example.

Leave a Comment