Press "Enter" to skip to content

Author: Kevin Feasel

Domains in ANSI SQL

Joe Celko describes a domain:

For example, if there though is that there is a domain called voltage which has a base unit called “volt” that’s otherwise meaningless. Yes, you can get a voltmeter you can watch the needle, you can be told what the IEEE specification for defining how much work a volt should do or shock you. I’ve discussed scales and types of measurements in a previous article, It’s worth mentioning that you should not confuse domain with the representation and symbols of the units being used. Some domains are limited, such as degrees that measure planar angles. An angle can be from 0 to 360°, or it can be between zero and 2π radians.

Joe has an explanation but doesn’t have any concrete examples in psql. Here’s one from the PostgreSQL documentation:

CREATE DOMAIN us_postal_code AS TEXT
CHECK(
   VALUE ~ '^\d{5}$'
OR VALUE ~ '^\d{5}-\d{4}$'
);

The idea of a domain here is that you define a valid slice of some data type. We can do something similar with check constraints on an attribute, but the difference is that we’d need to create the check constraint for each relevant attribute, whereas the domain would include this check automatically, making it quite useful if we have multiple instances of, say, us_postal_code in our database. Then, we wouldn’t need to worry about creating a check constraint on each instance and ensuring that the code remains the same across the board.

This also leads to a very common sentiment in functional programming: make invalid states unrepresentable. In other words, make it impossible for a person or piece of code to generate a result in an invalid state. By defining a domain with the scope of our valid state, we make it impossible for someone to create a US postal code value that does not pass our check, and so we can’t have dirty data of this sort in our database.

Comments closed

Data Quality Management with Great Expectations and Databricks

Sairamakrishna BuchiReddy Karri and Srinivasarao Rayankula show off Great Expectations:

Data quality checks are critical for any production pipeline. While there are many ways to implement them, the Great Expectations library is a popular one. 

Great Expectations is a powerful tool for maintaining data quality by defining, managing, and validating expectations for your data. In this article, we will discuss how you can use it to ensure data quality in your data pipelines.

Click through to see how it all works.

Comments closed

Querying a Microsoft Fabric SQL Endpoint from a Notebook

Dennes Torres wants to hit a SQL endpoint:

Let’s analyse why we would like to query an SQL Endpoint. Once we understand why, we can dig into how to make a query to a SQL Endpoint.

We use notebooks to connect directly to lakehouse. Except by the T-SQL notebook, the notebooks have a default lakehouse and work directly with it from spark. However, accessing other data object may be more complex

Specifically, this is a Spark notebook in Microsoft Fabric running Scala rather than a pure Python notebook, and is hitting the data warehouse SQL endpoint.

Comments closed

SQL Server 2019 Mainstream Support Ended

Steve Jones points out an important milestone:

Well, not really the end. I doubt anyone running SQL Server 2019 is going to stop (or upgrade) just because mainstream support ended. Actually, I wonder how many of you know that SQL Server 2019 passed out of mainstream support on Feb 28, 2025. I do think the 6 or 7 of you running Big Data Clusters likely knew this was the end of any support.

I saw a report in the Register on this, which includes a survey of which versions are still running. This is from an IT asset firm and matches Brent Ozar’s Population report. 44% of you are running SQL Server 2019, which is the largest percentage. Since there’s an additional 32% of you running versions older than 2019, I’m sure that upgrading isn’t a priority.

Given how there were some own-goals with SQL Server 2022, especially the earlier CUs of it, it’s not very surprising at all that the plurality of customers are still on 2019. Companies tend to follow an “every other release” strategy, so I’d expect some gradual migration to SQL Server 2025 after it comes out and a few CUs are in place.

Comments closed

Reporting Query Errors via Extended Events

Tom Collins wants to know what’s going wrong:

We have an ETL job failing every night . The ETL job is using SQL Server Stored Procedures ,  so we’re looking to be trace the errors reported in SQL Server, to give some clues on how to fix the problem?

Do you have an approach to monitor and save SQL Server  error messages ?

This is an extremely useful way of catching problems in code, as well as people goofing up ad hoc queries.

Comments closed

CIS Benchmarks for PostgreSQL

Elizabeth Christensen reviews some security guidance:

The Center for Internet Security (CIS) releases security benchmarks to cover a wide variety of infrastructure used in modern applications, including databases, operating systems, cloud services, containerized services, and even networking. Since 2016 Crunchy Data has collaborated with CIS to provide this security resource for those deploying Postgres. The output of this collaboration is a checklist for folks to follow and improve the security posture of Postgres deployments.

The PostgreSQL CIS Benchmark™ for PostgreSQL 17 was just recently released.

Click through for more information on the CIS, as well as a link to the benchmark PDF and some tips around it.

Comments closed

GiST Indexes in PostgreSQL

Lee Asher shows off a neat form of indexing:

In Parts I (Solving the Overlap Query Problem in PostgreSQL) and II (Overlapping Ranges in Subsets in PostgreSQL) of this series, we used the GiST index type – and its lesser known cousin: SP-GiST – to turbocharge the performance of overlap queries. But GiST indexes are extremely versatile, with uses far beyond the examples I have used so far.

In this final article, we’ll explore some of the many other ways they can be used.

Click through for use cases, such as multi-dimensional BETWEEN operations (e.g., bid price between X and Y, and ask price between X1 and Y1), date intervals, and more.

Comments closed

Blocking Transformations in SSIS Data Flows

Andy Brownsword doesn’t want any blockers:

We have a platter of transformations to use when constructing data flows in SSIS packages. Not all transformations are equal though, and some can catch us out with performance impact as data volumes scale.

In this post we’ll look at blocking transformations which can trip us up if we’re not careful with them (or avoid them).

Click through for some examples of blocking transformations, as well as some tips on how to manage them.

Comments closed

Fabric Espresso Episodes on Data Warehousing and Storage

Estera Kot shares some links:

For the past 1.5 years, the Microsoft Fabric Product Group Product Managers have been publishing a YouTube series featuring deep dives into Microsoft Fabric’s features. These episodes cover both technical functionalities and real-world scenarios, providing insights into the product roadmap and the people driving innovation. With over 80+ episodes, the series serves as a valuable resource for anyone looking to understand and optimize their use of Microsoft Fabric.

Click through for a link to the full series, as well as 22 separate episodes covering warehousing and data storage topics.

Comments closed