Press "Enter" to skip to content

Author: Kevin Feasel

Database Compatibility Levels and SQL Server Versions

Kendra Little demands change:

According to Microsoft’s documentation, “Database compatibility level … allow[s] the SQL Server Database Engine to be upgraded while keeping the same functional status for connecting applications by maintaining the same pre-upgrade database compatibility level.”

But these days, the “functional status” of a database at a given compatibility level differs depending on whether you’re using SQL Server, Azure SQL Managed Instance, or Azure SQL Database – and in the hosted versions it may change anytime without notice. Surprise, your database behaves differently now!

The whole concept is breaking down, and this is bad news for users of both managed services and the boxed product.

Read on to learn more about the issue.

Comments closed

Advent of Code Day 5

Kevin Wilkie continues the advent of code. Part 1 starts with data prep and ends with some of the analysis:

There was a lot going on with this one, so let’s go ahead and get started. First thing we want to do, as always, is to go ahead and load all of our data into SQL Server. With this one, I cheated a little and loaded it into multiple tables to start with – since there are 7 different groups of data that we have to play with.

Could I have loaded it into one table and pull from there into other tables? Absolutely! Do I really want to? Heck, no!

Part 2 continues cross-referencing over a larger set of data:

This time, the meaning of the seed numbers is slightly changing on us. Instead of it only being 20 seeds we have to cross-reference (at least that’s how many seeds my input lists out), we now have 10 seeds and a rather large range of numbers to work with after each of those 10.

Thankfully, we can still use our Tally table to create yet another table with all of the seeds listed in it – yes, all of them!

Read on for Kevin’s solution.

Comments closed

Concatenating Many Files in Azure Blob Storage

Drew Furgiuele concatenates a lot of files:

Lately, I’ve found myself with a few requests from friends and users that have a particular problem: they’ve got themselves a data lake in Azure, and they can read and write files just fine to it. The problem, though, is that sometimes they need to take a series of files and mash them all together, or as the cool kids call it: concatenate them. And when it comes to third party tools and methods that can do the trick, you’re spoiled for choice: Azure Data Factory, Spark via Databricks, or even PowerShell.

Case in point: I was working with someone who had tens of thousands of CSV files that they needed to merge together into one big file, but they were already out in their Azure storage account. That doesn’t sound so bad, does it? 

Drew explains why it is, but also why it isn’t. So click through and check that out.

Comments closed

Tokenizing SQL Queries

Phil Factor has a new script:

Imagine this: you have several directories full of SQL script files, and you need to know where a certain table is used. You’d rather like the context too, so you can check the whole SQL Expression and work out why it is running so slowly. Maybe, from that same daunting set of several directories, you need to search for a comment, either end of line, or block comment, perhaps structured. It could be that you just need to execute each query or statement in turn to check performance.

It’s not unusual to want to search just within strings. Although simple searches can be effective, you will at some point need a tool that is able to recognise and return a collection of strings representing the SQL code, divided up into the respective components of the SQL language.

For this article, we’ll use a PowerShell cmdlet, called Tokenize-SQLString, which is in my GitHub repository.

Read on for examples of how it works and the types of things you can do with it.

Comments closed

Window Function Execution Plans with RANGE

Hugo Kornelis continues a series on explaining the execution plans for window functions:

This is part twenty-six of the plansplaining series. And already the fourth episode about window functions. The first of those posts covered basic window functions; the second post focused on fast-track optimization for running aggregates, and the third post explained how the optimizer works around the lack of execution plan support for UNBOUNDED FOLLOWING.

But all of those were about OVER specifications that use the ROWS keyword. Let’s now look at the alternative, the RANGE keyword.

Click through to see how the various options work with RANGE. By the way, I still want range intervals, like how Postgres implements them, where you can define an interval of X days/hours/minutes/whatever rather than a specific number of rows. Maybe one of these versions…

Comments closed

Looping through Lakehouses in Microsoft Fabric Spark Jobs

Dennes Torres builds a loop:

I have published videos and articles before about Lakehouse maintenance. In this article I want to address a missing point for a lot of Fabric administrators: How to do maintenance on multiple lakehouses that are located in different workspaces.

One of the videos I have published explains the maintenance of multiple lakehouses, but only addresses maintenance in a single workspace. Is it a good idea to keep multiple lakehouses in the same workspace? Probably not.

Click through for the process.

Comments closed

Explaining Odds Ratios

Steven Sanderson explains the concept of an odds ratio:

Imagine a loan officer flipping a coin to decide whether to approve your loan. Odds ratios tell you how much more likely one factor (like your income) makes the “heads” (approval) side appear compared to another (like your student status).

In logistic regression, odds ratios compare the odds of an event (loan default, in our case) for two groups defined by a specific variable. They’re like multipliers: greater than 1 means something increases the chances of default, while less than 1 means it decreases them.

As for why, we use odds ratios because it’s hard to track and interpret changes in probabilities directly, at least when you’re thinking small numbers.

Comments closed

Generating Fabric Delta Tables from Power BI Semantic Models

Nikola Ilic is excited:

A few days ago, while preparing materials for the customer training on Microsoft Fabric, I stumbled upon a very interesting article at Microsoft Learn. The article describes how to integrate Power BI semantic models (aka datasets) into OneLake.

At first glance, this doesn’t sound like something epic, but when I started thinking more and more about it, I realized that this really might be a huge thing in many different scenarios. First of all, at the moment of writing, this feature is still in preview – this means, it can change to some extent in the coming months, before eventually becoming GA. Nevertheless, I decided to take a shot and explore what can be done with OneLake integration for semantic models.

Read on to learn more about what this is doing and what you can do with it.

Comments closed

GPS Data in PostGIS

Ryan Lambert clues us in:

One of the key elements to using PostGIS is having spatial data to work with! Lucky for us, one big difference today compared to the not-so-distant past is that essentially everyone is carrying a GPS unit with them nearly everywhere. This makes it easy to create your own GPS data that you can then load into PostGIS! This post explores some basics of loading GPS data to PostGIS and cleaning it for use. It turns out, GPS data fr om nearly any GPS-enabled device comes with some… character. Getting from the raw input to usable spatial data takes a bit of effort.

This post starts with using ogr2ogr to load the .gpx data to PostGIS. Once the data is in PostGIS then we actually want to do something with it. Before the data is completely usable, we should spend some time cleaning the data first. Technically you can start querying the data right away, however, I have found there is always data cleanup and processing involved first to make the data truly useful.

Click through for an example of how it all fits together.

Comments closed

Troubleshooting the Automation of a Process

Reitse Eskens tells a tale of woe:

When I got the job to restore a bacpac file, I didn’t think much of it. Because it’s quite easy. Get the bacpac and restore it with Azure Data Studio or Sql Server Management Studio. Get coffee, done. Then I got these requirements:

  • The bacpac needs to be downloaded daily from an sFTP server, without certificate
  • The bacpac file is in a zip file
  • The zip file is password protected
  • Everything must be done serverless (Azure Automation, Logic App, Function App, Data Factory and/or Synapse Analytics)
  • It’s a daily process that needs to be done without human supervision

Welcome to cloud development: Part A is easy, Part B is easy, Part C is mildly challenging, and combining A with B and C is a total nightmare because it turns out that A and B aren’t compatible, so by the end, you’re dealing with A” and b and D (because C, C’, C”, etc. wouldn’t work and c and c’ would work but had severe limitations preventing you from using it in this scenario).

Comments closed