Press "Enter" to skip to content

Day: May 2, 2025

Breaking down the Limitations of R^2

M. Fatih Tüzen explains an important regression concept:

When building a statistical model, one of the first numbers analysts and data scientists often cite is the , or coefficient of determination. It’s widely reported in research, academic theses, and industry reports — and yet, frequently misunderstood or misused.

Does a high R² mean your model is good? Is it enough to evaluate model performance? What about its adjusted or predictive counterparts?

Read on to learn the answers to each question. H/T R-Bloggers.

Leave a Comment

Model Diagnostics for Statistics vs Machine Learning

Christian Lorentzen talks diagnostics:

In this post, we show how different use cases require different model diagnostics. In short, we compare (statistical) inference and prediction.

As an example, we use a simple linear model for the Munich rent index dataset, which was kindly provided by the authors of Regression – Models, Methods and Applications 2nd ed. (2021). This dataset contains monthy rents in EUR (rent) for about 3000 apartments in Munich, Germany, from 1999.

Read on to learn more about this dataset and how the mindset differs if you’re thinking about inference versus prediction.

Leave a Comment

Purging Data from Large Tables

Matt Gantz deletes the elephant:

Purging data from a table is a common database maintenance task to prevent it from growing too large or to stay in compliance with data retention. When dealing with small amounts of data, this can be accomplished by a simple delete with no issues; however, with larger tables, this task can be problematic. Deleting records requires a lock that can block other processes from writing or even reading the data (depending on your isolation level). In this article I will share a technique I have used to work with some very large tables.

I’ve followed exactly this pattern many a time, and it works quite well if you have an appropriate supporting index.

Leave a Comment

Medallion Architecture in Fabric Real-Time Intelligence

Tyler Chessman is like an onion:

Building a multi-layer, medallion architecture using Fabric Real-Time Intelligence (RTI) requires a different approach compared to traditional data warehousing techniques. But even transactional source systems can be effectively processed in RTI. To demonstrate, we’ll look at how sales orders (created in a relational database) can be continuously ingested and transformed through a RTI bronze, silver, and gold layer.

Read on to see how.

Leave a Comment

Tracking Query Lineage in Microsoft Fabric Lakehouses

Dennes Torres wants to know who is your daddy and what does he do:

If you check the text of the queries, at the end of the text you will find content like this:

OPTION (label = N”{“DatasetId”:”1269551b-bf26-47de-b0f0-974fa60f7b08″,”Sources”:[{“ReportId”:”01ab9208-399a-47ec-b444-d03633fc3e1d”,”VisualId”:”30ac676503a0bd357312″,”Operation”:”AutoPageRefresh”}]}”)

This has an interesting meaning:

  • We can use this information to track the query lineage
  • Applications can send lineage (or more) to SQL using OPTION (LABEL) statement

Click through to learn how you can use this information.

Leave a Comment

Justifying Costs to Management

Kevin Hill broaches a challenging topic:

Your systems, your data, your customer experience – they all rely on that “invisible” database engine humming along behind the scenes. And if you’re responsible for keeping it running, you need the budget to do it right.

Here’s how to make your case without getting buried in tech jargon or glazed-over stares.

Dave Wentzel has a very solid response to this in the comments. My point of emphasis is working in business terms. Think in terms of return on investment, especially if you can calculate it. That’s a real challenge for technical people because we think in terms of capabilities and don’t have much information on the practical effects of whatever it is we do all day, but figure out what your company uses for cost analysis and try to work in those terms.

Leave a Comment