Press "Enter" to skip to content

Day: March 27, 2025

Domains in ANSI SQL

Joe Celko describes a domain:

For example, if there though is that there is a domain called voltage which has a base unit called “volt” that’s otherwise meaningless. Yes, you can get a voltmeter you can watch the needle, you can be told what the IEEE specification for defining how much work a volt should do or shock you. I’ve discussed scales and types of measurements in a previous article, It’s worth mentioning that you should not confuse domain with the representation and symbols of the units being used. Some domains are limited, such as degrees that measure planar angles. An angle can be from 0 to 360°, or it can be between zero and 2π radians.

Joe has an explanation but doesn’t have any concrete examples in psql. Here’s one from the PostgreSQL documentation:

CREATE DOMAIN us_postal_code AS TEXT
CHECK(
   VALUE ~ '^\d{5}$'
OR VALUE ~ '^\d{5}-\d{4}$'
);

The idea of a domain here is that you define a valid slice of some data type. We can do something similar with check constraints on an attribute, but the difference is that we’d need to create the check constraint for each relevant attribute, whereas the domain would include this check automatically, making it quite useful if we have multiple instances of, say, us_postal_code in our database. Then, we wouldn’t need to worry about creating a check constraint on each instance and ensuring that the code remains the same across the board.

This also leads to a very common sentiment in functional programming: make invalid states unrepresentable. In other words, make it impossible for a person or piece of code to generate a result in an invalid state. By defining a domain with the scope of our valid state, we make it impossible for someone to create a US postal code value that does not pass our check, and so we can’t have dirty data of this sort in our database.

Leave a Comment

Data Quality Management with Great Expectations and Databricks

Sairamakrishna BuchiReddy Karri and Srinivasarao Rayankula show off Great Expectations:

Data quality checks are critical for any production pipeline. While there are many ways to implement them, the Great Expectations library is a popular one. 

Great Expectations is a powerful tool for maintaining data quality by defining, managing, and validating expectations for your data. In this article, we will discuss how you can use it to ensure data quality in your data pipelines.

Click through to see how it all works.

Leave a Comment

Querying a Microsoft Fabric SQL Endpoint from a Notebook

Dennes Torres wants to hit a SQL endpoint:

Let’s analyse why we would like to query an SQL Endpoint. Once we understand why, we can dig into how to make a query to a SQL Endpoint.

We use notebooks to connect directly to lakehouse. Except by the T-SQL notebook, the notebooks have a default lakehouse and work directly with it from spark. However, accessing other data object may be more complex

Specifically, this is a Spark notebook in Microsoft Fabric running Scala rather than a pure Python notebook, and is hitting the data warehouse SQL endpoint.

Leave a Comment

Reporting Query Errors via Extended Events

Tom Collins wants to know what’s going wrong:

We have an ETL job failing every night . The ETL job is using SQL Server Stored Procedures ,  so we’re looking to be trace the errors reported in SQL Server, to give some clues on how to fix the problem?

Do you have an approach to monitor and save SQL Server  error messages ?

This is an extremely useful way of catching problems in code, as well as people goofing up ad hoc queries.

Leave a Comment

SQL Server 2019 Mainstream Support Ended

Steve Jones points out an important milestone:

Well, not really the end. I doubt anyone running SQL Server 2019 is going to stop (or upgrade) just because mainstream support ended. Actually, I wonder how many of you know that SQL Server 2019 passed out of mainstream support on Feb 28, 2025. I do think the 6 or 7 of you running Big Data Clusters likely knew this was the end of any support.

I saw a report in the Register on this, which includes a survey of which versions are still running. This is from an IT asset firm and matches Brent Ozar’s Population report. 44% of you are running SQL Server 2019, which is the largest percentage. Since there’s an additional 32% of you running versions older than 2019, I’m sure that upgrading isn’t a priority.

Given how there were some own-goals with SQL Server 2022, especially the earlier CUs of it, it’s not very surprising at all that the plurality of customers are still on 2019. Companies tend to follow an “every other release” strategy, so I’d expect some gradual migration to SQL Server 2025 after it comes out and a few CUs are in place.

Leave a Comment