Press "Enter" to skip to content

Curated SQL Posts

Model Diagnostics for Statistics vs Machine Learning

Christian Lorentzen talks diagnostics:

In this post, we show how different use cases require different model diagnostics. In short, we compare (statistical) inference and prediction.

As an example, we use a simple linear model for the Munich rent index dataset, which was kindly provided by the authors of Regression – Models, Methods and Applications 2nd ed. (2021). This dataset contains monthy rents in EUR (rent) for about 3000 apartments in Munich, Germany, from 1999.

Read on to learn more about this dataset and how the mindset differs if you’re thinking about inference versus prediction.

Comments closed

Purging Data from Large Tables

Matt Gantz deletes the elephant:

Purging data from a table is a common database maintenance task to prevent it from growing too large or to stay in compliance with data retention. When dealing with small amounts of data, this can be accomplished by a simple delete with no issues; however, with larger tables, this task can be problematic. Deleting records requires a lock that can block other processes from writing or even reading the data (depending on your isolation level). In this article I will share a technique I have used to work with some very large tables.

I’ve followed exactly this pattern many a time, and it works quite well if you have an appropriate supporting index.

Comments closed

Medallion Architecture in Fabric Real-Time Intelligence

Tyler Chessman is like an onion:

Building a multi-layer, medallion architecture using Fabric Real-Time Intelligence (RTI) requires a different approach compared to traditional data warehousing techniques. But even transactional source systems can be effectively processed in RTI. To demonstrate, we’ll look at how sales orders (created in a relational database) can be continuously ingested and transformed through a RTI bronze, silver, and gold layer.

Read on to see how.

Comments closed

Tracking Query Lineage in Microsoft Fabric Lakehouses

Dennes Torres wants to know who is your daddy and what does he do:

If you check the text of the queries, at the end of the text you will find content like this:

OPTION (label = N”{“DatasetId”:”1269551b-bf26-47de-b0f0-974fa60f7b08″,”Sources”:[{“ReportId”:”01ab9208-399a-47ec-b444-d03633fc3e1d”,”VisualId”:”30ac676503a0bd357312″,”Operation”:”AutoPageRefresh”}]}”)

This has an interesting meaning:

  • We can use this information to track the query lineage
  • Applications can send lineage (or more) to SQL using OPTION (LABEL) statement

Click through to learn how you can use this information.

Comments closed

Justifying Costs to Management

Kevin Hill broaches a challenging topic:

Your systems, your data, your customer experience – they all rely on that “invisible” database engine humming along behind the scenes. And if you’re responsible for keeping it running, you need the budget to do it right.

Here’s how to make your case without getting buried in tech jargon or glazed-over stares.

Dave Wentzel has a very solid response to this in the comments. My point of emphasis is working in business terms. Think in terms of return on investment, especially if you can calculate it. That’s a real challenge for technical people because we think in terms of capabilities and don’t have much information on the practical effects of whatever it is we do all day, but figure out what your company uses for cost analysis and try to work in those terms.

Comments closed

Vacuum Queries in PostgreSQL

Dave Stokes shares some queries:

 I am (slowly) adding handy PostgreSQL queries to my GitHub, and Vacuum is the newest category.  The end goal is to have a compilation of queries for those of us who need to keep an instance healthy.

I’ve always liked these compendia of helpful queries. The downside is that these are PDF images, making them harder to use. But even so, Dave is accepting pull requests.

Comments closed

Power BI in a Microsoft Fabric World

Koen Verbeeck answers a question:

We’re a relatively small shop that has been using Power BI for our analytical needs for years now. We’re very pleased with the product, but the recent introduction of Microsoft Fabric has made us a bit anxious. When comparing Microsoft Fabric vs Power BI, it all seems very complex and we’re not even sure we need it. What will happen with Power BI? Will it be replaced with Fabric?

Click through for Koen’s advice and thoughts on the matter.

Comments closed

April 2025 Updates for Microsoft Fabric

Patrick LeBlanc is back with another compendium:

Welcome to the Fabric April 2025 Feature Summary! This update brings exciting advancements across various workloads, including Low-code AI tools to accelerate productivity in notebooks (Preview), session Scoped distributed #temp table in Fabric Data Warehouse (Generally Available) and the Migration assistant for Fabric Data Warehouse (Preview) to simplify your migration experience.

This one isn’t quite as long as last month’s release, but there are still a couple dozen entries.

Comments closed

Oracle Memory Management Tips

Kellyn Gorman shares a few tips with us:

After the initial overview of memory monitoring in Oracle, particularly around the SGA and PGA, I wanted to take a deeper dive into the most common Oracle memory configurations. These include how memory settings are defined, how they interact, and the combinations you’re likely to encounter across different workloads.  Oracle memory is a no-brainer for many of us who’ve been in the space for an extensive time, but I’ve come to realize, it can be a bit overwhelming and especially with recent changes from Oracle 12c on.

Click through for Kellyn’s guidance.

Comments closed