Kevin Feasel – Page 11

Generating Synthetic Data in Python

Published 2025-07-22 by Kevin Feasel

Ivan Palomares Carrascosa makes some data:

This article introduces the Faker library for generating synthetic datasets. Through a gentle hands-on tutorial, we will explore how to generate single records or data instances, full datasets in one go, and export them into different formats. The code walkthrough adopts a twofold perspective:

Learning: We will gain a basic understanding of several data types that can be generated and how to get them ready for further processing, aided by popular data-intensive libraries like Pandas

Testing: With some generated data at hand, we will provide some hints on how to test data issues in the context of a simplified ETL (Extract, Transform, Load) pipeline that ingests synthetically generated transactional data.

Click through for the article. I’m not intimately familiar with Faker, so I’m not sure how easy it is to change dataset distributions. That’s one of the challenges I tend to have with automated data generators: generating a simulated dataset is fine if you just need X number of rows, but if the distribution of synthetic data in development is nowhere near what the real data’s distribution is in production, you may get a false sense of security in things like report response times.

Comments closed

The PRODUCT() Function in SQL Server 2025

Published 2025-07-22 by Kevin Feasel

Ed Pollack points out a new function:

With each version of SQL Server, there are always a few new features introduced that we applaud as we finally have access to a useful function that is already available elsewhere.

Introduced in SQL Server 2025 CTP 1.3, the PRODUCT() function acts similarly to SUM(), but multiplies values rather than adds them. It is an aggregate function in SQL Server and therefore operates on a data set, rather than on scalar values.

Ed notes that there are aggregate and window function versions of PRODUCT() and shows examples of how it works.

Comments closed

Ordering Collections in Powershell

Published 2025-07-22 by Kevin Feasel

Shane O’Neill demands order:

However, there is one drawback I have with hash tables: they don’t default to the order in which they are inserted. I’m OK with this since I come from a DB background, and I’m used to order not being enforced unless I specify an ORDER BY. Not everyone is as lenient as we are, though, and the vast majority of the louder masses expect this ordering.

Now, there are ways around this unordered aspect of hash tables.

Click through for the easy answer, the less easy answer, and some additional thoughts on script development from Shane.

Comments closed

Goodbye, Default Semantic Models

Published 2025-07-22 by Kevin Feasel

Pradeep Srikakolapu makes an announcement:

Microsoft Fabric is officially sunsetting Default Semantic Models. This change is part of our ongoing efforts to simplify and improve the manageability, deployment, and governance of Fabric items such as warehouse, lakehouse, SQL database, and mirrored databases.

This is definitely a good thing. The idea of a default semantic model wasn’t bad, especially early on in Microsoft Fabric’s development life. But those default models almost never had enough information to do what customers actually want, so they would sit there as a distraction.

Comments closed

Tips for Highly Available PostgreSQL Systems

Published 2025-07-22 by Kevin Feasel

Semab Tariq provides some high-level guidance:

In today’s digital landscape, downtime isn’t just inconvenient, it’s costly. No matter what business you are running, an e-commerce site, a SaaS platform, or critical internal systems, your PostgreSQL database must be resilient, recoverable, and continuously available. So in short

High Availability (HA) is not a feature you enable; it’s a system you design.

In this blog, we will walk through the important things to consider when setting up a reliable, production-ready HA PostgreSQL system for your applications.

Click through for a variety of things to think about. Most of this will apply to other database systems as well, though specific tools will differ.

Comments closed

The Microsoft Fabric Service Status Page

Published 2025-07-22 by Kevin Feasel

Brent Ozar notes a new status page:

I’ve been pretty vocal here on the blog and on social media about the reliability problems with Microsoft Fabric. Today, I’ve got good news: Microsoft released a new Fabric status page and a known issues page, something that really does take guts given the current reliability situation.

It’ll be important to see how frequently they update this status page and if the page displays sufficient information on issues in a timely manner. But this is a good starting point.

Comments closed

Resetting Sequences on Tables with Default Constraints

Published 2025-07-21 by Kevin Feasel

Vlad Drumea is not a fan of sequences:

This is a script that I wrote to help me next time I might need to bulk reseed out-of-sync SQL Server sequences tied to default constraints.

Click through for a demo of the problem, as well as Vlad’s script to fix it.

Comments closed

What’s Old is New Again: Lakebases

Published 2025-07-21 by Kevin Feasel

Daniel Janik notes the cyclical nature of things:

For years, the narrative pushed was that traditional relational databases were ill-suited for the scale and complexity of modern BI solutions. The marketing was something like: “Databases don’t belong in BI; use Spark!” We embraced distributed computing frameworks, data lakes, and complex ETL pipelines to move data from operational databases into analytical engines. The idea was to separate transactional workloads from analytical ones to ensure performance and scalability. Spark, with its ability to handle massive datasets and flexible processing, became the darling of the data world.

“Remember, Sully, when I said you don’t need databases anymore?”

“Yeah, Matrix, I remember!”

“I lied.”

Comments closed

Grouping Sets in T-SQL

Published 2025-07-21 by Kevin Feasel

Erik Darling has a new video.

Erik mentions that he doesn’t often see GROUPING SETS in the wild. I’ve used them several times. And the use of the term “several times” probably gives you exactly the feeling that I intended. I really like grouping sets for very specific analytical system purposes (at least for moderate-sized datasets), so I’m glad that syntax is there. But outside of reporting queries, it’s a really uncommon bit of syntax.

Comments closed

Migrating Import-Based Semantic Models to Direct Lake

Published 2025-07-21 by Kevin Feasel

Teo Lachev gives us a case study:

I’ve recently written about strategies for addressing memory pressures with Fabric large semantic models and I mentioned that one of them was switching to Direct Lake. This blog captures my experience of migrating a real-life import semantic model to Direct Lake.

Read on for Teo’s notes and tips.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Author: Kevin Feasel