Press "Enter" to skip to content

Day: July 22, 2025

Generating Synthetic Data in Python

Ivan Palomares Carrascosa makes some data:

This article introduces the Faker library for generating synthetic datasets. Through a gentle hands-on tutorial, we will explore how to generate single records or data instances, full datasets in one go, and export them into different formats. The code walkthrough adopts a twofold perspective:

  1. Learning: We will gain a basic understanding of several data types that can be generated and how to get them ready for further processing, aided by popular data-intensive libraries like Pandas
  2. Testing: With some generated data at hand, we will provide some hints on how to test data issues in the context of a simplified ETL (Extract, Transform, Load) pipeline that ingests synthetically generated transactional data.

Click through for the article. I’m not intimately familiar with Faker, so I’m not sure how easy it is to change dataset distributions. That’s one of the challenges I tend to have with automated data generators: generating a simulated dataset is fine if you just need X number of rows, but if the distribution of synthetic data in development is nowhere near what the real data’s distribution is in production, you may get a false sense of security in things like report response times.

Leave a Comment

The PRODUCT() Function in SQL Server 2025

Ed Pollack points out a new function:

With each version of SQL Server, there are always a few new features introduced that we applaud as we finally have access to a useful function that is already available elsewhere.

Introduced in SQL Server 2025 CTP 1.3, the PRODUCT() function acts similarly to SUM(), but multiplies values rather than adds them. It is an aggregate function in SQL Server and therefore operates on a data set, rather than on scalar values.

Ed notes that there are aggregate and window function versions of PRODUCT() and shows examples of how it works.

Leave a Comment

Ordering Collections in Powershell

Shane O’Neill demands order:

However, there is one drawback I have with hash tables: they don’t default to the order in which they are inserted. I’m OK with this since I come from a DB background, and I’m used to order not being enforced unless I specify an ORDER BY. Not everyone is as lenient as we are, though, and the vast majority of the louder masses expect this ordering.

Now, there are ways around this unordered aspect of hash tables.

Click through for the easy answer, the less easy answer, and some additional thoughts on script development from Shane.

Leave a Comment

Goodbye, Default Semantic Models

Pradeep Srikakolapu makes an announcement:

Microsoft Fabric is officially sunsetting Default Semantic Models. This change is part of our ongoing efforts to simplify and improve the manageability, deployment, and governance of Fabric items such as warehouse, lakehouse, SQL database, and mirrored databases.

This is definitely a good thing. The idea of a default semantic model wasn’t bad, especially early on in Microsoft Fabric’s development life. But those default models almost never had enough information to do what customers actually want, so they would sit there as a distraction.

Leave a Comment

The Microsoft Fabric Service Status Page

Brent Ozar notes a new status page:

I’ve been pretty vocal here on the blog and on social media about the reliability problems with Microsoft Fabric. Today, I’ve got good news: Microsoft released a new Fabric status page and a known issues page, something that really does take guts given the current reliability situation.

It’ll be important to see how frequently they update this status page and if the page displays sufficient information on issues in a timely manner. But this is a good starting point.

Leave a Comment

Tips for Highly Available PostgreSQL Systems

Semab Tariq provides some high-level guidance:

In today’s digital landscape, downtime isn’t just inconvenient, it’s costly. No matter what business you are running, an e-commerce site, a SaaS platform, or critical internal systems, your PostgreSQL database must be resilient, recoverable, and continuously available. So in short

High Availability (HA) is not a feature you enable; it’s a system you design.

In this blog, we will walk through the important things to consider when setting up a reliable, production-ready HA PostgreSQL system for your applications.

Click through for a variety of things to think about. Most of this will apply to other database systems as well, though specific tools will differ.

Leave a Comment