Press "Enter" to skip to content

Author: Kevin Feasel

An Overview of DataDiluvium

Adron Hall has a new tool and a new blog series. The first post is a product overview:

DataDiluvium is a web-based tool available at datadiluvium.com that helps developers, database administrators, and data engineers generate realistic test data from SQL schema definitions. Whether you’re setting up a development environment, creating test scenarios, or preparing data for demonstrations, DataDiluvium streamlines the process of data generation.

The second covers some of the development precepts Adron used:

DataDiluvium is a web-based tool I’ve built designed to help developers, database administrators, and data engineers generate realistic test data based on SQL schema definitions. The tool takes SQL table definitions as input and produces sample data in various formats, making it easier to populate development and testing environments with meaningful data.

The tool is free, so if you’re looking for a sample data generator, check it out.

Comments closed

T-SQL Tuesday 184 Round-Up

Deborah Melkin casts a wide net:

There were a lot of themes that I noticed throughout everyone’s posts. First were the number of people who mentioned that mentoring doesn’t have to be formal or even a 1:1 relationship. Mentoring isn’t just for adults and careers, but for the next generation too. Mentoring has helped their careers or become part of a core tenant in their company and how they run their business. It’s a place to grow our community, and not just for those who look like us. We all talked about how we have grown from mentoring, not just as mentees but as mentors.

Click through for a dozen-and-a-half responses to the T-SQL Tuesday call.

Comments closed

COPY and \COPY in PostgreSQL

Dave Stokes runs two commands:

PostgreSQL is equivalent to a Swiss Army Knife in the database world. There are things in PostgreSQL that are very simple to use, while in another database, they take many more steps to accomplish. But sometimes, the knife has too many blades, which can cause confusion. This is one of those cases.

Read on to understand what the difference is between these two commands.

Comments closed

Improving the Microsoft Fabric Copy Job

Krishnakumar Rukmangathan makes a copy:

Copy Job has been a go-to tool for simplified data ingestion in Microsoft Fabric, offering a seamless data movement experience from any source to any destination. Whether you need batch or incremental copying, it provides the flexibility to meet diverse data needs while maintaining a simple and intuitive workflow.

We continuously refine Copy Job based on customer feedback, enhancing both functionality and user experience. In this update, we’re introducing three key UX improvements designed to streamline your workflow and boost efficiency.

Read on for those three improvements.

Comments closed

An Explanation of PostgreSQL’s Citus Extension

Craig Kerstiens covers a misunderstood extension:

Citus is in a small class of the most advanced Postgres extensions that exist. While there are many Postgres extensions out there, few have as many hooks into Postgres or change the storage and query behavior in such a dramatic way. Most that come to Citus have very wrong assumptions. Citus turns Postgres into a sharded, distributed, horizontally scalable database (that’s a mouthful), but it does so for very specific purposes.

Read on to learn when Citus can work well, when it isn’t a good fit, and a few architecture and design recommendations around using the extension.

Comments closed

Linting SQL with SQLFluff

Josephine Bush busts out a linter:

I thought I didn’t care about linting, and lately, I haven’t written a lot of SQL, but for the SQL I do write, I have SQLFluff to help me format it. A friend of mine is big into SQLFluff and finally talked me into installing and using it. For more information about SQLFluff itself, visit here.

Josephine shows off some of the configuration for PostgreSQL’s psql as well as SQL Server’s T-SQL.

Comments closed

Understanding Availability Zones in Azure

Mika Sutinen explains some of the nuance around Azure availability zones:


Azure Availability Zones
 help provide resiliency to your database services within an Azure Region. I simply love it how simple Microsoft has made building geographically dispersed database services. If you’ve ever designed and deployed multi-site, highly available database services in on-premises, you know what I am talking about.

However, with the Availability Zones in Azure, there are a couple of things to know. I’ve learned my lessons the hard way, so in this post I am providing some tools and guidance on how to avoid some pitfalls when building multi-zone database services.

Click through for that guidance.

Comments closed

Regular Expression Matches in PostgreSQL

Tobias McNulty now has two problems:

regexp_matches() and regexp_match() are two similar string functions that support regular expression matching directly in the PostgreSQL database. regexp_matches() was added in PostgreSQL 8.3, and regexp_match() was added in PostgreSQL 10 (keep reading to see how ChatGPT struggled to answer this question).

Read on for that as well as plenty more information on how the two work, and even a bonus snippet on another regular expression function.

Comments closed

Foreign Key Relationships in Microsoft Fabric Data Warehouses

Jared Westover looks at key constraints:

In late 2024, I noticed a comment on the Microsoft Learn site stating that foreign keys could improve query performance on tables in a Fabric warehouse. That claim immediately caught my attention. I wanted to answer a simple question: Do relationships help, hurt, or have no effect when added to tables in a Fabric warehouse?

Let’s get more specific—do foreign keys improve query performance when reading data (not loading)? In other words, do they make queries run faster?

Sadly, the answer is not as promising as with SQL Server. But this also makes sense considering the distributed nature of Fabric data warehouses.

Comments closed

Asynchronous SQL Statement Execution in Snowflake

Koen Verbeeck doesn’t want to wait for an answer:

It’s been a while since I blogged about Snowflake, but a recent LinkedIn post caught my attention: the ability to add asynchronous execution of SQL statements in a stored procedure. In other words: parallel execution of SQL statements. This got me excited, because in my opinion this is something that has been missing in T-SQL since forever. Every time you want to do something in parallel, you need to use external tools to accomplish this in SQL Server (or Azure SQL DB, or Fabric Warehouse, or Fabric SQL DB, or … you get the point). You needed to use SQL Server Agent Jobs, or SSIS packages, or Azure Data Factory and so on.

Snowflake introduces the ASYNC and AWAIT keywords, which can be used to trigger asynchronous execution. 

Read on for a very simple example and some thoughts from Koen. Aside from possibly making data modifications faster (assuming there are no constraint checks), I’m not quite sure what the major benefit to this is. I’d generally use asynchronous calls to support UI operations, letting a calling application respond to user input while some background thread processes data. But I’m not positive what you get from pushing async/await logic into the database itself.

Comments closed