Press "Enter" to skip to content

Month: March 2025

Finding Mismatched Rows in Power Query

Reza Rad looks for that other sock:

Finding rows that are in one table, but not the other is one of the most common scenarios happening in any data related applications. You may have customer records coming from two sources, and want to find data rows that exist in one, but not the other. In Power Query, you can use Merge to combine data tables together. Merge can be also used for finding mismatch records. You will learn through this blog post, how in Power Query you can find out which records are missing with Merge, and then report it in Power BI. To learn more about Power BI, read Power BI book from Rookie to Rock Star.

Click through for the video and article.

Leave a Comment

Semantic Search in PostgreSQL

Hans-Jürgen Schönig performs a search:

PostgreSQL offers advanced capabilities through extensions like pgvector, which enable semantic search at a level and quality never achieved before. Unlike traditional text search, which mostly relies on trivial string comparison, semantic search in PostgreSQL goes beyond keywords by understanding context and meaning, enhancing relevance.

The quick idea here is that converting words (or parts of words) into vectors can maintain most of the semantic meaning behind the words. Then, when we perform certain types of vector comparisons, we can take advantage of this semantic meaning and find results whose language is different from our query but the concept is a match for what we want. Click through for the full article.

Leave a Comment

Building a Simple Microservice with Azure Functions

Temidayo Omoniyi takes us through an example of creating a microservice:

Today’s architecture is serverless intensive, with multiple microservices performing a particular task. Industries are beginning to move away from traditional monolithic applications, which have a single large codebase infrastructure handling everything, to an easier microservice approach.

Click through for a primer on serverless architecture, microservices, and how to create a simple Python app that acts as a microservice.

Leave a Comment

Backup Storage Billing for Fabric SQL Databases

Amar Digamber Patil makes an announcement:

While compute and data storage are already included in the Fabric capacity-based billing model, after April 1, 2025, backup storage will also be billed. However, customers will only be charged backup storage that exceeds the allocated database size.

Click through to see what’s changing and how to get ahead of this. I’m not sure there are any ways to reduce that backup price, short of managing the data in your database and not having enormous amounts of transaction log activity.

Leave a Comment

Source Control without a Budget

Kevin Hill is looking out for tightwads like us:

Every DBA has been there. Trying to keep track of database schema changes while developers have their fancy Git repositories and version control workflows. Meanwhile, database changes are often handled through a mix of manual scripts, backups, and maybe a SharePoint folder full of “final_final_v2.sql” files.

Did you know that SQL Server already has a built-in tool that can help you track your database schema changes, without spending a dime on third-party tools?

I hadn’t thought about Kevin’s solution that way, but it does make a lot of sense as a way of quickly getting files into a source control repo.

Leave a Comment

Finding Microsoft Fabric Administrative Documentation

Nicky van Vroenhoven digs through the docs:

Earlier I wrote about the default domain settingschanges to the default tenant setting value for SQL database and I also covered the (rights of the) Fabric Adminsitrator role. Today I want to talk about a more meta-topic: existing documentation on Microsoft Learn, and of course specifically for Admins.

It turns out that there’s a central link for this documentation within Microsoft Learn. Nicky also includes call to action to fix simple documentation issues (such as typos) that you find.

Leave a Comment

Domains in ANSI SQL

Joe Celko describes a domain:

For example, if there though is that there is a domain called voltage which has a base unit called “volt” that’s otherwise meaningless. Yes, you can get a voltmeter you can watch the needle, you can be told what the IEEE specification for defining how much work a volt should do or shock you. I’ve discussed scales and types of measurements in a previous article, It’s worth mentioning that you should not confuse domain with the representation and symbols of the units being used. Some domains are limited, such as degrees that measure planar angles. An angle can be from 0 to 360°, or it can be between zero and 2π radians.

Joe has an explanation but doesn’t have any concrete examples in psql. Here’s one from the PostgreSQL documentation:

CREATE DOMAIN us_postal_code AS TEXT
CHECK(
   VALUE ~ '^\d{5}$'
OR VALUE ~ '^\d{5}-\d{4}$'
);

The idea of a domain here is that you define a valid slice of some data type. We can do something similar with check constraints on an attribute, but the difference is that we’d need to create the check constraint for each relevant attribute, whereas the domain would include this check automatically, making it quite useful if we have multiple instances of, say, us_postal_code in our database. Then, we wouldn’t need to worry about creating a check constraint on each instance and ensuring that the code remains the same across the board.

This also leads to a very common sentiment in functional programming: make invalid states unrepresentable. In other words, make it impossible for a person or piece of code to generate a result in an invalid state. By defining a domain with the scope of our valid state, we make it impossible for someone to create a US postal code value that does not pass our check, and so we can’t have dirty data of this sort in our database.

Leave a Comment

Data Quality Management with Great Expectations and Databricks

Sairamakrishna BuchiReddy Karri and Srinivasarao Rayankula show off Great Expectations:

Data quality checks are critical for any production pipeline. While there are many ways to implement them, the Great Expectations library is a popular one. 

Great Expectations is a powerful tool for maintaining data quality by defining, managing, and validating expectations for your data. In this article, we will discuss how you can use it to ensure data quality in your data pipelines.

Click through to see how it all works.

Leave a Comment

Querying a Microsoft Fabric SQL Endpoint from a Notebook

Dennes Torres wants to hit a SQL endpoint:

Let’s analyse why we would like to query an SQL Endpoint. Once we understand why, we can dig into how to make a query to a SQL Endpoint.

We use notebooks to connect directly to lakehouse. Except by the T-SQL notebook, the notebooks have a default lakehouse and work directly with it from spark. However, accessing other data object may be more complex

Specifically, this is a Spark notebook in Microsoft Fabric running Scala rather than a pure Python notebook, and is hitting the data warehouse SQL endpoint.

Leave a Comment

Reporting Query Errors via Extended Events

Tom Collins wants to know what’s going wrong:

We have an ETL job failing every night . The ETL job is using SQL Server Stored Procedures ,  so we’re looking to be trace the errors reported in SQL Server, to give some clues on how to fix the problem?

Do you have an approach to monitor and save SQL Server  error messages ?

This is an extremely useful way of catching problems in code, as well as people goofing up ad hoc queries.

Leave a Comment