Press "Enter" to skip to content

Month: May 2024

OPTIMIZE FOR vs Forced Plans in SQL Server

Erik Darling makes a comparison:

I often see clients using forced plans or plan guides (yes, even still, to this day) to deal with various SQL Server performance problems with plans changing.

There’s usually an execution plan or two floating around that seems to be a good general idea for a given query, and a couple weird high-end and low-end outliers for very specific populations of values.

Read the whole thing, of course.

In defense of plan guides, the company I used to work for had a few—maybe three or four in total—because of really weird data skew problems on database 106 out of 700 (or so)—because there’s always one customer that makes wildly different use of the system than everyone else. And so a query that worked perfectly fine for 699 databases (or so) flops like a fish out of water for this one database with this one customer’s data in it. So the plan guide was a nicer expediency than optimizing for mediocre on all 700 (or so) databases.

1 Comment

Open Questions on Fabric Administration

Paul Andrew asks some great questions:

Microsoft Fabric is a big product with lots of different data handling capabilities. From a data engineering perspective creating and innovating with Fabric as a unified tool is a great experience, ultimately delivering data insights for the business and adding value, nice! However, as with all new developments, the creativity is the fun part. The governance and movement of code into production is less fun and can become the hard/ugly part if the change management, platform and governance aren’t mature enough.

Paul doesn’t have answers for us, though I do think many of these will eventually have answers most people find reasonable.

Comments closed

T-SQL Tuesday 174 Round-Up

I do a thing:

I thought about doing this in the normal Curated SQL style, where I grab a graf from each post. But instead, you get the second-best Curated SQL format: the hodge-podge bulleted list, but still in present tense and with a touch of commentary and the occasional rant.

Really, it should be “I did a thing” but I can’t go having that past tense nonsense here, now can I?

Thank you again to everyone who contributed. It was a lot of fun reading through all of these.

Comments closed

Using Project Configuration for SSIS Package Deployment

Andy Brownsword shows us the brand new (well, okay, 12 years old) deployment model for Integration Services:

Last week we looked at configuring SSIS packages using package configuration. This week we’ll look at another approach for configuration: Project Configuration

Project Configuration is the standard way to configure projects. This took over from the legacy approach which we looked at last week. This method allows us to share common parameters and connections across an entire project to help managing a number of packages more efficient.

Read on to see how it works.

Comments closed

Removing Elements from a Vector in R

Steven Sanderson wants to leave one of these things out:

Working with vectors is one of the fundamental aspects of R programming. Sometimes, you need to remove specific elements from a vector to clean your data or prepare it for analysis. This post will guide you through several methods to achieve this, using base R, dplyr, and data.table. We’ll look at examples for both numeric and character vectors and explain the code in a straightforward manner. By the end, you’ll have a clear understanding of how to manipulate your vectors efficiently. Let’s dive in!

Read on for three pairs of examples, one for numeric vectors and one for character vectors.

Comments closed

Accuracy is Not Enough for Classification

I have a new video:

In this video, I explain why accuracy is not the be-all, end-all measure for classification. After that, I introduce the confusion matrix, a mechanism for tracking predicted versus actual values. Then, I talk about a variety of measures and how we can derive them from the confusion matrix.

The trickiest part of the confusion matrix measures is just remembering which measures comport to which combinations in the matrix. The second-trickiest part of the confusion matrix is that R and Python invert them, so reading across the top row in R is equivalent to reading down the first column in Python.

Comments closed

Diagnosing DirectQuery Connection Limit Performance Problems

Chris Webb does a bit of sleuthing:

A few months ago I blogged about the new limits available for the Maximum Connections Per Data Source property in Premium and why the number of connections that a DirectQuery semantic model can open to a data source is so important for report performance. At that time, however, there was no way for you to know whether the performance of your reports was being affected by a lack of available connections. The good news is that, with the announcement this week of the new Execution Metrics event in Log Analytics and Profiler, you can now see when this is happening.

Read on for an illustration of the problem and how you can resolve it.

Comments closed

Data Connections in Microsoft Fabric

Soheil Bakhshi takes us through the ins and outs of data connections:

Managing data connections in Microsoft Fabric can be challenging if you’re unsure where to start. This blog post and its detailed YouTube video will help you find, manage, and share the existing data connections, making your workflow more efficient and streamlined. A meaningful use case for this feature is to reuse the existing connections leading to more controlled connections to the data sources. More on this later in this blog.

Click through for the article and link to the video.

Comments closed

Window Functions and DAX

Marco Russo and Alberto Ferrari explain how window functions work in DAX:

Window functions like OFFSET and WINDOW return rows from a table based on the current row. For example, OFFSET (-1) returns the previous row. The main question is: how does OFFSET determine the current row? Intuitively, it searches in the row context and in the filter context for values of columns, and it determines the current row. Unfortunately, that intuitive understanding is missing many details. Although it is easy to intuit the current row in simple queries, things are sometimes more complex. To make sense of the current row in a non-trivial scenario, you need to understand apply semantics.

Read on to learn more, especially when there are partition groups (think the PARTITION BY section in a T-SQL window clause).

Comments closed

An Introduction to pg_timeseries

Samay Sharma and Jason Petersen have an announcement:

We are excited to launch pg_timeseries: a PostgreSQL extension focused on creating a cohesive user experience around the creation, maintenance, and use of time-series tables. You can now use pg_timeseries to create time-series tables, configure the compression and retention of older data, monitor time-series partitions, and run complex time-series analytics functions with a user-friendly syntax. pg_timeseries is open-sourced under the PostgreSQL license and can be added to your existing PostgreSQL installation or tried as a part of the Timeseries Stack on Tembo Cloud.

Read on to learn more about how it works. The syntax and concepts do remind me a good bit of InfluxDB, as well.

Comments closed