Press "Enter" to skip to content

Curated SQL Posts

Orchestration Controllers in Azure Data Factory

Martin Schoombee gets to the top of the pyramid:

Controllers are pipelines that initiate the execution of a single process or task in a specific order and with constraints. Whereas everything else in this framework is pretty automated, this part is entirely manual.

Why? Well, when I started thinking about the design of this framework I knew I needed something at the “highest level” that would execute an entire daily ETL process, or a modified ETL process that only loads specific data during the day. I wanted to maximize the flexibility of the framework, and that either meant adding another level to the metadata structure or creating this layer of pipelines that sit at the top. I opted for the second, because I did not feel it was worth the complexity of adding another layer into the metadata structure. That being said, it doesn’t mean it cannot or shouldn’t be done…it was a personal choice I made to keep things as simple as I could.

Read on to learn more about what the controller should look like and how the other pieces fit in.

Comments closed

Trying Power BI Copilot to Describe DAX Measures

Marc Lelijveld tries out a feature:

It has been roughly two years since the last update on the Power BI Model Documenter, an external tool that I developed to auto-document Power BI Semantic Models. Back then, we still called it datasets though. Looking at the stats of my website, the model documenter and related posts/pages are still most read on my website – every month again.

As the technology has kept on improving, there are new options that will help you to kick-start generating documentation about your semantic model. In this post, I will elaborate on how you can use AI in Power BI to enrich your Model Documenter output, by helping you describing the measures you’ve added to your semantic model.

Read on for a demonstration, as well as what the auto-generated results get right and wrong.

Comments closed

OPTIMIZE FOR vs Forced Plans in SQL Server

Erik Darling makes a comparison:

I often see clients using forced plans or plan guides (yes, even still, to this day) to deal with various SQL Server performance problems with plans changing.

There’s usually an execution plan or two floating around that seems to be a good general idea for a given query, and a couple weird high-end and low-end outliers for very specific populations of values.

Read the whole thing, of course.

In defense of plan guides, the company I used to work for had a few—maybe three or four in total—because of really weird data skew problems on database 106 out of 700 (or so)—because there’s always one customer that makes wildly different use of the system than everyone else. And so a query that worked perfectly fine for 699 databases (or so) flops like a fish out of water for this one database with this one customer’s data in it. So the plan guide was a nicer expediency than optimizing for mediocre on all 700 (or so) databases.

1 Comment

Open Questions on Fabric Administration

Paul Andrew asks some great questions:

Microsoft Fabric is a big product with lots of different data handling capabilities. From a data engineering perspective creating and innovating with Fabric as a unified tool is a great experience, ultimately delivering data insights for the business and adding value, nice! However, as with all new developments, the creativity is the fun part. The governance and movement of code into production is less fun and can become the hard/ugly part if the change management, platform and governance aren’t mature enough.

Paul doesn’t have answers for us, though I do think many of these will eventually have answers most people find reasonable.

Comments closed

T-SQL Tuesday 174 Round-Up

I do a thing:

I thought about doing this in the normal Curated SQL style, where I grab a graf from each post. But instead, you get the second-best Curated SQL format: the hodge-podge bulleted list, but still in present tense and with a touch of commentary and the occasional rant.

Really, it should be “I did a thing” but I can’t go having that past tense nonsense here, now can I?

Thank you again to everyone who contributed. It was a lot of fun reading through all of these.

Comments closed

Using Project Configuration for SSIS Package Deployment

Andy Brownsword shows us the brand new (well, okay, 12 years old) deployment model for Integration Services:

Last week we looked at configuring SSIS packages using package configuration. This week we’ll look at another approach for configuration: Project Configuration

Project Configuration is the standard way to configure projects. This took over from the legacy approach which we looked at last week. This method allows us to share common parameters and connections across an entire project to help managing a number of packages more efficient.

Read on to see how it works.

Comments closed

Removing Elements from a Vector in R

Steven Sanderson wants to leave one of these things out:

Working with vectors is one of the fundamental aspects of R programming. Sometimes, you need to remove specific elements from a vector to clean your data or prepare it for analysis. This post will guide you through several methods to achieve this, using base R, dplyr, and data.table. We’ll look at examples for both numeric and character vectors and explain the code in a straightforward manner. By the end, you’ll have a clear understanding of how to manipulate your vectors efficiently. Let’s dive in!

Read on for three pairs of examples, one for numeric vectors and one for character vectors.

Comments closed

Accuracy is Not Enough for Classification

I have a new video:

In this video, I explain why accuracy is not the be-all, end-all measure for classification. After that, I introduce the confusion matrix, a mechanism for tracking predicted versus actual values. Then, I talk about a variety of measures and how we can derive them from the confusion matrix.

The trickiest part of the confusion matrix measures is just remembering which measures comport to which combinations in the matrix. The second-trickiest part of the confusion matrix is that R and Python invert them, so reading across the top row in R is equivalent to reading down the first column in Python.

Comments closed

Diagnosing DirectQuery Connection Limit Performance Problems

Chris Webb does a bit of sleuthing:

A few months ago I blogged about the new limits available for the Maximum Connections Per Data Source property in Premium and why the number of connections that a DirectQuery semantic model can open to a data source is so important for report performance. At that time, however, there was no way for you to know whether the performance of your reports was being affected by a lack of available connections. The good news is that, with the announcement this week of the new Execution Metrics event in Log Analytics and Profiler, you can now see when this is happening.

Read on for an illustration of the problem and how you can resolve it.

Comments closed

Data Connections in Microsoft Fabric

Soheil Bakhshi takes us through the ins and outs of data connections:

Managing data connections in Microsoft Fabric can be challenging if you’re unsure where to start. This blog post and its detailed YouTube video will help you find, manage, and share the existing data connections, making your workflow more efficient and streamlined. A meaningful use case for this feature is to reuse the existing connections leading to more controlled connections to the data sources. More on this later in this blog.

Click through for the article and link to the video.

Comments closed