Press "Enter" to skip to content

Curated SQL Posts

Microsoft Fabric March 2025 Updates

Patrick LeBlanc puts together a big list:

Welcome to the March Feature Summary!

From the innovative Variable library (Preview) to the powerful Service Principal support in the CI/CD features, there’s a lot to explore. Dive in and discover how the new Partner Workloads in Fabric bring cutting-edge capabilities to your workspace. Plus, enhanced OneLake security ensures your data is protected. And don’t miss out on the expanded regional availability for Eventstream’s managed private endpoints, making it easier for organizations worldwide to build secure, scalable streaming solutions.

With FabCon kicking off today, the announcements are rolling in! Get ready to explore these features and more in the March 2025 updates for Fabric!

FabCon triggered a large number of big announcements, and considering that the outline takes up about a page and a half, there’s a lot to dig into here.

Leave a Comment

Generating Realistic Data with Data Diluvium

Adron Hall wants to generate some realistic time-series data. First up is humidity data:

When working with TimeScale DB for time-series data, having realistic environmental data is crucial. I’ve found that humidity is a particularly important parameter that affects everything from agriculture to HVAC systems. 

Then comes temperature data:

Following up on my previous post about adding humidity data generation to Data Diluvium, I’m now adding temperature data generation. This completes the pair of environmental data generators I needed for my TimeScale DB setup. Temperature data is crucial for time-series analysis and works perfectly alongside the humidity data we just implemented.

Check out both of these for detailed coverage of how to generate realistic-looking simulated data.

Leave a Comment

Finding Mismatched Rows in Power Query

Reza Rad looks for that other sock:

Finding rows that are in one table, but not the other is one of the most common scenarios happening in any data related applications. You may have customer records coming from two sources, and want to find data rows that exist in one, but not the other. In Power Query, you can use Merge to combine data tables together. Merge can be also used for finding mismatch records. You will learn through this blog post, how in Power Query you can find out which records are missing with Merge, and then report it in Power BI. To learn more about Power BI, read Power BI book from Rookie to Rock Star.

Click through for the video and article.

Leave a Comment

Semantic Search in PostgreSQL

Hans-Jürgen Schönig performs a search:

PostgreSQL offers advanced capabilities through extensions like pgvector, which enable semantic search at a level and quality never achieved before. Unlike traditional text search, which mostly relies on trivial string comparison, semantic search in PostgreSQL goes beyond keywords by understanding context and meaning, enhancing relevance.

The quick idea here is that converting words (or parts of words) into vectors can maintain most of the semantic meaning behind the words. Then, when we perform certain types of vector comparisons, we can take advantage of this semantic meaning and find results whose language is different from our query but the concept is a match for what we want. Click through for the full article.

Leave a Comment

Building a Simple Microservice with Azure Functions

Temidayo Omoniyi takes us through an example of creating a microservice:

Today’s architecture is serverless intensive, with multiple microservices performing a particular task. Industries are beginning to move away from traditional monolithic applications, which have a single large codebase infrastructure handling everything, to an easier microservice approach.

Click through for a primer on serverless architecture, microservices, and how to create a simple Python app that acts as a microservice.

Leave a Comment

Backup Storage Billing for Fabric SQL Databases

Amar Digamber Patil makes an announcement:

While compute and data storage are already included in the Fabric capacity-based billing model, after April 1, 2025, backup storage will also be billed. However, customers will only be charged backup storage that exceeds the allocated database size.

Click through to see what’s changing and how to get ahead of this. I’m not sure there are any ways to reduce that backup price, short of managing the data in your database and not having enormous amounts of transaction log activity.

Leave a Comment

Source Control without a Budget

Kevin Hill is looking out for tightwads like us:

Every DBA has been there. Trying to keep track of database schema changes while developers have their fancy Git repositories and version control workflows. Meanwhile, database changes are often handled through a mix of manual scripts, backups, and maybe a SharePoint folder full of “final_final_v2.sql” files.

Did you know that SQL Server already has a built-in tool that can help you track your database schema changes, without spending a dime on third-party tools?

I hadn’t thought about Kevin’s solution that way, but it does make a lot of sense as a way of quickly getting files into a source control repo.

Leave a Comment

Finding Microsoft Fabric Administrative Documentation

Nicky van Vroenhoven digs through the docs:

Earlier I wrote about the default domain settingschanges to the default tenant setting value for SQL database and I also covered the (rights of the) Fabric Adminsitrator role. Today I want to talk about a more meta-topic: existing documentation on Microsoft Learn, and of course specifically for Admins.

It turns out that there’s a central link for this documentation within Microsoft Learn. Nicky also includes call to action to fix simple documentation issues (such as typos) that you find.

Leave a Comment

Domains in ANSI SQL

Joe Celko describes a domain:

For example, if there though is that there is a domain called voltage which has a base unit called “volt” that’s otherwise meaningless. Yes, you can get a voltmeter you can watch the needle, you can be told what the IEEE specification for defining how much work a volt should do or shock you. I’ve discussed scales and types of measurements in a previous article, It’s worth mentioning that you should not confuse domain with the representation and symbols of the units being used. Some domains are limited, such as degrees that measure planar angles. An angle can be from 0 to 360°, or it can be between zero and 2π radians.

Joe has an explanation but doesn’t have any concrete examples in psql. Here’s one from the PostgreSQL documentation:

CREATE DOMAIN us_postal_code AS TEXT
CHECK(
   VALUE ~ '^\d{5}$'
OR VALUE ~ '^\d{5}-\d{4}$'
);

The idea of a domain here is that you define a valid slice of some data type. We can do something similar with check constraints on an attribute, but the difference is that we’d need to create the check constraint for each relevant attribute, whereas the domain would include this check automatically, making it quite useful if we have multiple instances of, say, us_postal_code in our database. Then, we wouldn’t need to worry about creating a check constraint on each instance and ensuring that the code remains the same across the board.

This also leads to a very common sentiment in functional programming: make invalid states unrepresentable. In other words, make it impossible for a person or piece of code to generate a result in an invalid state. By defining a domain with the scope of our valid state, we make it impossible for someone to create a US postal code value that does not pass our check, and so we can’t have dirty data of this sort in our database.

Leave a Comment

Data Quality Management with Great Expectations and Databricks

Sairamakrishna BuchiReddy Karri and Srinivasarao Rayankula show off Great Expectations:

Data quality checks are critical for any production pipeline. While there are many ways to implement them, the Great Expectations library is a popular one. 

Great Expectations is a powerful tool for maintaining data quality by defining, managing, and validating expectations for your data. In this article, we will discuss how you can use it to ensure data quality in your data pipelines.

Click through to see how it all works.

Leave a Comment