Press "Enter" to skip to content

Day: July 7, 2022

Visualizing Kafka Stream Lineage

David Araujo and Julia Peng show off stream lineage in Confluent Cloud:

Stream Lineage is a tool Confluent built to address the lack of data visibility in Kafka and event-driven architectures. Confluent’s Stream Lineage provides an interactive map of all your data flows that enable users to:

1. Understand what data flows are running both now or at any point in the past

2. Trace where each data flow originated from

3. Track how data is transformed along its journey

4. Observe where each data flow ends up

Read on to see how it works.

Comments closed

The Power of Hash Match Joins

Jared Poche looks at hash matching:

When I began working at Microsoft, I was very much a novice at performance troubleshooting. There was a lot to learn, and hash match joins were pointed out to me multiple times as the potential cause for a given issue. So, for a while I had it in my head, “hash match == bad”. But this really isn’t the case.

Hash matches aren’t inefficient; they are the best way to join large result sets together. The caveat is that you have a large result set, and that itself may not be optimal. Should it be returning this many rows? Have you included all the filters you can? Are you returning columns you don’t need?

I might throw in one caveat about hash match joins and being the best performers for two really large datasets joining together: merge join can be more efficient so long as both sets are guaranteed to be ordered in the same way without an explicit sort operator. That last clause is usually the kicker.

Comments closed

Ordered Columnstore Indexes in SQL Server 2022

Brent Ozar appreciates order:

So essentially, every column has a whole bunch of indexes on it.

But there’s no order whatsoever as to which rows end up in which index.

This isn’t a problem for relatively small tables, but as you get to billion-row data warehouse fact tables where columnstore should really shine, performance gradually degrades. In data warehouses, fact tables often have a commonly filtered column, like SaleDate. However, until SQL Server 2022, even if you wanted a small SaleDate range, your query would likely check hundreds or thousands of row groups, each of which had a huge range of data.

But do read the whole thing, as it seems it’s not working correctly in CTP 2.0 of SQL Server 2022. It is quite useful in Azure Synapse Analytics dedicated SQL pools, at least—that I can confirm.

Comments closed

Removing a Data Disk from a Running Azure VM

Joey D’Antoni tightrope walks without a net for fun:

I was working with a client recently, were we had to reconfigure storage within a VM (which is always a messy proposition). In doing so, we were adding and removing disks from the VM. this all happened mostly during a downtime window, so it wasn’t a big deal to down a VM, which is how you can remove a disk from a VM via the portal. However, upon further research, I learned that through the portal you can remove a disk from a running VM.

Read on to see how. Though I’d generally still recommend shutting the VM off first just to be sure.

Comments closed

Indexing and Parameter Sensitive Plan Optimization

Erik Darling continues a series on Parameter Sensitive Plan optimization. First up is a post on indexing:

Anyway, let’s use the example that I had started with here, to illustrate that the PSP optimization does work with a computed column, but… like any other column, indexes make all the difference.

I’m using the same example query over and over again, because a lot of the other great examples of parameter sensitivity that I have demo queries written for don’t seem to trigger it.

And sometimes there’s just nothing to do:

After seeing places where the Parameter Sensitive Plan (PSP) optimization quite stubbornly refuses to kick in, it’s somewhat amusing to see it kick in where it can’t possibly have any positive impact.

Even though some parameters are responsible for filtering on columns with highly skewed data, certain other factors may be present that don’t allow for the type of plan quality issues you might run into under normal parameter sensitivity scenarios:

This continues to be a very interesting look into one of the most-anticipated features in SQL Server 2022, as well as a necessary wet blanket for the hype.

Comments closed

Refreshing Power BI Usage Metrics Report Credentials

Stephanie Bruno fixes a credentials issue:

Have you ever had a Usage Metrics Report just stop refreshing? At the bottom you’ll see a little message that tells you to check the credentials. But you don’t have access to the Usage Metrics Report dataset in the list of datasets in the workspace because it’s hidden! So even if you wanted to delete it and nicely ask it to start again, you can’t do it from the workspace. What can you do? Well, you can delete the Usage Metrics Report dataset with the Power BI Rest API.

Read on to see how.

Comments closed