Press "Enter" to skip to content

Day: May 7, 2025

The Dual Perils of Overfitting and Data Leakage

John Mount shares notes on a theme:

One of the bigger risks of iterative statistical or machine learning fitting procedures is over-fit or the dreaded data leak.

Over-fit is when: a model performs better on training data than on future data. Some degree of over-fit is expected. A data leak is when: the model learns things about the evaluation set that it would not know about the future data the model will be applied on. This can drive models that look great on training and (supposedly) held-out data, but don’t work in practice.

Click through for the rest of the story, and be sure to check out the comments for a notebook digging further into one of the topics.

Leave a Comment

The APPLY Operator in T-SQL

I have a new video:

In this video, I explain the history behind the APPLY operator in T-SQL. I also demonstrate one use case for it: executing table-valued functions.

As I mention at the end of the video, this is by far the least interesting use case for the APPLY operator. Subsequent videos in the series get a lot more exciting. But we have to start at the beginning, unless we’re going full Pulp Fiction.

Leave a Comment

Organizing a Microsoft Fabric Data Platform with Domains

Jon Vöge does a bit of organization:

A topic which seems more relevant than ever, is the question of how to organize the contents of your Microsoft Fabric Platform.

Through the contents of a few blogs, I will give you an overview of things to consider, as well as suggestions that you can choose from when designing your platform.

This first week, we’ll take a look at Domains in Microsoft Fabric.

Read on to understand why domains can be valuable and a solid way to structure them.

Leave a Comment

Choosing a Warehousing Data Architecture

James Serra compares and contrasts OLAP architectures:

As discussed in my blog and book “Deciphering Data Architectures: Choosing Between a Modern Data Warehouse, Data Fabric, Data Lakehouse, and Data Mesh” (Amazon), organizations are often challenged with choosing the right data architecture to meet their business goals—especially as AI and data-driven decision-making take center stage. To help clarify, here’s a quick review of the four core architectures, followed by guidance on when to use each. Each architecture includes five stages of data movement – ingest, store, transform, model, and visualize (described here).

Click through for James’s take on how each of them works and when you might choose one over the other.

Leave a Comment

Writing to Microsoft FabricDelta Tables in Python via DuckDB

Gilbert Quevauvilliers does a bit of writing:

When I was exploring how to easily write to Delta Tables with a Python notebook, it took me a considerable amount of time to find out how to do this.

This is my learnings below, and from my point of view it makes it easy to write to a Lakehouse table, like what is done with a PySpark notebook.

Click through for one very important note, as well as the process.

Leave a Comment

Tracking Wait Classes in Oracle

Kellyn Gorman looks for performance bugbears:

When we talk about optimization in Oracle, many DBAs immediately jump into metrics like CPU utilization, I/O throughput, or specific SQL queries. But there’s a critical layer of understanding that often gets overlooked and that’s Oracle Wait Classes. These categories are essential to making sense of what’s really going on inside your database, and they’re often the starting point for diagnosing and optimizing performance.

Read on to learn what wait classes are and why they’re so important for Oracle DBAs.

Leave a Comment

Rounded Corners in Power BI

Elena Drakulevska breaks out the sandpaper:

I’ve been seeing all kinds of shapes in Power BI reports lately—and seems like there’s a lot of guesswork when it comes to rounded corners. Some people just throw in a number because it “feels modern”, others skip it altogether. But rounding shouldn’t be random.

Wondering what to do with your edges? Putting in a radius just because? Or maybe your designs still feel… too pointy?

Click through for some thoughts on rounding corners in Power BI, as well as how to do it.

Leave a Comment