Press "Enter" to skip to content

Author: Kevin Feasel

Model Documentation via Fabric Data Agent

Chris Webb gets some answers:

AI is meant to help us automate boring tasks, and what could be more boring than creating documentation for your Power BI semantic models? It’s such a tedious task that most people don’t bother; there’s also an ecosystem of third party tools that do this job for you, and you can also build your own solution for this using DAX DMVs or the new-ish INFO functions (see here for a good example). That got me wondering: can you use Fabric Data Agents to generate documentation for you? And what’s more, why even generate documentation when you can just ask a Data Agent the questions that you’d need to generate documentation to answer?

For a simple scenario, Chris was able to get pretty solid results. As complexity grows, your mileage may vary.

Leave a Comment

Resulting Data Types from a UNION Operation

Andy Brownsword puts on the lab coat and performs some experiments:

The UNION and UNION ALL operators allow us to combine results, but there’s no guarantee that each set of results uses the same data types. So what data types are returned?

For the longest time I thought the data types from the first set of results were used for the final results. That’s not the case.

Read on to see what the rules look like.

Leave a Comment

Breaking down the Limitations of R^2

M. Fatih Tüzen explains an important regression concept:

When building a statistical model, one of the first numbers analysts and data scientists often cite is the , or coefficient of determination. It’s widely reported in research, academic theses, and industry reports — and yet, frequently misunderstood or misused.

Does a high R² mean your model is good? Is it enough to evaluate model performance? What about its adjusted or predictive counterparts?

Read on to learn the answers to each question. H/T R-Bloggers.

Leave a Comment

Model Diagnostics for Statistics vs Machine Learning

Christian Lorentzen talks diagnostics:

In this post, we show how different use cases require different model diagnostics. In short, we compare (statistical) inference and prediction.

As an example, we use a simple linear model for the Munich rent index dataset, which was kindly provided by the authors of Regression – Models, Methods and Applications 2nd ed. (2021). This dataset contains monthy rents in EUR (rent) for about 3000 apartments in Munich, Germany, from 1999.

Read on to learn more about this dataset and how the mindset differs if you’re thinking about inference versus prediction.

Leave a Comment

Medallion Architecture in Fabric Real-Time Intelligence

Tyler Chessman is like an onion:

Building a multi-layer, medallion architecture using Fabric Real-Time Intelligence (RTI) requires a different approach compared to traditional data warehousing techniques. But even transactional source systems can be effectively processed in RTI. To demonstrate, we’ll look at how sales orders (created in a relational database) can be continuously ingested and transformed through a RTI bronze, silver, and gold layer.

Read on to see how.

Leave a Comment

Purging Data from Large Tables

Matt Gantz deletes the elephant:

Purging data from a table is a common database maintenance task to prevent it from growing too large or to stay in compliance with data retention. When dealing with small amounts of data, this can be accomplished by a simple delete with no issues; however, with larger tables, this task can be problematic. Deleting records requires a lock that can block other processes from writing or even reading the data (depending on your isolation level). In this article I will share a technique I have used to work with some very large tables.

I’ve followed exactly this pattern many a time, and it works quite well if you have an appropriate supporting index.

Leave a Comment

Tracking Query Lineage in Microsoft Fabric Lakehouses

Dennes Torres wants to know who is your daddy and what does he do:

If you check the text of the queries, at the end of the text you will find content like this:

OPTION (label = N”{“DatasetId”:”1269551b-bf26-47de-b0f0-974fa60f7b08″,”Sources”:[{“ReportId”:”01ab9208-399a-47ec-b444-d03633fc3e1d”,”VisualId”:”30ac676503a0bd357312″,”Operation”:”AutoPageRefresh”}]}”)

This has an interesting meaning:

  • We can use this information to track the query lineage
  • Applications can send lineage (or more) to SQL using OPTION (LABEL) statement

Click through to learn how you can use this information.

Leave a Comment

Justifying Costs to Management

Kevin Hill broaches a challenging topic:

Your systems, your data, your customer experience – they all rely on that “invisible” database engine humming along behind the scenes. And if you’re responsible for keeping it running, you need the budget to do it right.

Here’s how to make your case without getting buried in tech jargon or glazed-over stares.

Dave Wentzel has a very solid response to this in the comments. My point of emphasis is working in business terms. Think in terms of return on investment, especially if you can calculate it. That’s a real challenge for technical people because we think in terms of capabilities and don’t have much information on the practical effects of whatever it is we do all day, but figure out what your company uses for cost analysis and try to work in those terms.

Leave a Comment