Press "Enter" to skip to content

Curated SQL Posts

Migrating SQL Server Database Files between Storage Subsystems

Andy Yun does a bit of shuffling:

In my role at Pure Storage, I often engage with customers who wish to migrate their SQL Server databases off of their prior storage onto our hardware. And after some digging around for prior-published material, I was surprised to find that there really wasn’t much that was comprehensive. After all, one doesn’t change SANs too often. But when it does happen, it is nice to have some reference material from others who have. So I decided to try and give a good overview of how I’d approach the challenge.

This is meant to be a “food for thought” kind of post. I’m going to keep things somewhat high level, but will provide links to other blogs and material that can help you continue down whatever path you choose. And for simplicity, I’m going to limit this scope to a single SQL Server.

Read on for a few questions you should answer, followed by some notes and preferences. Andy’s filegroups tip is also a really good one.

Comments closed

Comparing Power BI Stacked Column Charts

Meagan Longoria compares and contrasts:

One of the new features in the August Power BI Desktop release is the updated legends that are styled to more accurately reflect the per-series formatting on the visual. This made me curious how close I could get to the clean look of a Deneb (vega-lite) stacked bar chart.

I used open source data from the Vega github repo and applied a few filters in Power BI. I used exactly the same source data in each chart, filtering down to five sites and 9 varieties. I chose a visual with a legend, data labels, and total labels to display the differences. We can argue about whether you should show a chart with this many labels another day.

Click through to see how the in-built visual compares to a custom visual of the same form.

Comments closed

Compressing Indexes and Shrinking Azure SQL MI Databases

Kendra Little has a good reason for an often-bad act:

Shrinking databases in SQL Server isn’t fun – it’s slow, it causes blocking if you forget to use the WAIT_AT_LOW_PRIORITY option, and sometimes it persistently fails and refuses to budge until you restart the instance. You only want to shrink a SQL Server database when you’ve got a good reason and a lot of patience.

If you’re using Azure SQL Managed Instance and you haven’t already used data compression on your indexes and shrunk your databases, you probably have two good reasons to do both of those things: performance and cost reduction.

Compressing indexes is very often (almost always?) a good thing. Shrinking databases is very often (again, almost always?) a bad thing. This is like a buddy cop movie for your database.

Kendra gives some good advice but also lays out a warning if you’re on General Purpose V1, so read the whole thing.

Comments closed

Random Forest Missing Data Imputation using missRanger

Michael Mayer handles missing data:

{missRanger} is a multivariate imputation algorithm based on random forests, and a fast version of the original missForest algorithm of Stekhoven and Buehlmann (2012). Surprise, surprise: it uses {ranger} to fit random forests. Especially combined with predictive mean matching (PMM), the imputations are often quite realistic.

This looks like an interesting package. At first, I thought it was a way of generating predictions outside the boundaries of training data and had concerns—a classic point (limitation?) of random forest as an algorithm is that it will not even try to predict values outside the range of what it sees in training data, so if the largest label is 10 and the smallest is 0, you won’t see a prediction of 11 or 50, no matter how you scale the inputs.

Instead of doing that, missRanger looks like it’s filling in missing data using a clever approach. That’s quite useful for dealing with incomplete data, a really common problem whose good solutions tend to be complex enough that people typically ignore them in favor of simple but less useful solutions like dropping rows altogether.

Comments closed

mssparkutils now notebookutils and Validating DAGs in Fabric

Sandeep Pawar gives us two quick hits:

First, if you haven’t noticed mssparkutils has been officially renamed to notebookutils. Check out the official documentation for details. Be sure to use/update your notebooks to notebookutils.

Read on for a pair of notes around this name change, as well as some capabilities to validate DAGs when using runMultiple to orchestrate multiple notebook executions.

Comments closed

Setting Row Label and Key Columns in Power BI

Chris Webb makes use of metadata:

A few weeks ago I wrote a post on how to improve the results you get from Power BI Copilot by editing the Linguistic Schema. As I mentioned, though, there are in fact lots of different ways that you as a Power BI semantic model developer can improve the results you get from Copilot and in this post I’ll show you another one: setting the Row Labels and Key Columns properties on a table.

Read on to see how these can affect results.

Comments closed

Databricks Notebook Package Installation and Variables

Chen Hirsh diagnoses a problem:

A friend called to ask for my help with a weird issue. In a Databricks notebook using Python, he declares and assigns a variable in the first cell. Something like that:

my_var = 1

He then runs the rest of the notebook, and somewhere along the way, tries to use this variable, and gets this message:

NameError: name 'my_var' is not defined

Going back to cell 1, and checking the value of my_var, he gets the same error.

Read on for the root cause of the issue, as well as a pair of helpful tips from Chen.

Comments closed

pg_dump and the Backup Tool Debate

Gulcin Yildirim Jelinek explains the debate around whether pg_dump is a backup tool or not:

Recently, while writing about the vulnerability affecting pg_dump, the topic of decommissioning pg_dump came up on Twitter. Unlike the nostalgic feelings many had for Pluto, there was less reluctance to see pg_dump reclassified. In fact, some people were eager to retire it as a backup utility, and I even got a bit of pushback for still referring to pg_dump as one

I was talking to my colleague Simona the other day, and she mentioned that everybody in Postgres circles says, “pg_dump is not a backup tool,” but perhaps it’s not always explained well why it is not.

Read on for that explanation.

Comments closed

System.Data.SqlClient Deprecated

David Engel has an announcement:

We announced Microsoft.Data.SqlClient in the first half of 2019 and shipped the first stable package later that year. That release coincided with .NET Core 3.0. We’ve been developing Microsoft.Data.SqlClient in the dotnet/SqlClient repo since that time, over the last five years. It’s now time to start the deprecation process for the System.Data.SqlClient package. We plan to take a staged process to deprecation, with the intent to align support changes and associated code transition activities between these libraries to natural migration points between .NET major versions.

I haven’t used System.Data.SqlClient in a while, so I’m not sure how much is in that library that isn’t in Microsoft.Data.SqlClient.

Comments closed