Press "Enter" to skip to content

Curated SQL Posts

T-SQL Tuesday 170 Roundup

Reitse Eskens writes up a roundup about abandoned projects:

When I thought of this subject last year, I was really on the fence if it would work or not. Part of me was convinced it would elicit some response from the community, part of me was convinced people would be looking for a ‘happy’ start of the year and might not want to think or write about past learnings.

Part of me was right, but I never expected SO MANY of you to jump in and write so many wonderful blogs. It feels a bit unfair to summarise all your hard hard work, so please click the links to read the full stories. Well worth your time!

Click through for plenty of stories on the topic of lessons learned from abandoned projects.

Comments closed

Cross-Database Sync in Azure SQL Database

Jose Manuel Jurado Diaz goes through a lot of work to replicate replication:

This week, we addressed a service request from a customer who wanted to keep two tables synchronized across different databases on the same Azure SQL Database server. Unfortunately, in Azure SQL Database, it’s not possible to directly call the database in the operation, for instance using the command select * from master.sys.sys_databases. For this reason, I’d like to share an alternative to tackle this limitation.

This is a way to do it, and it’s fine if there’s only one entry point for the data and you’re okay with writes taking more than twice as long. But this is a perfect use case for transactional replication. The only problem is, Azure SQL Database doesn’t support being a transactional replication publisher or distributor, only a (push) subscriber.

Comments closed

The Power of Schema Binding

Guy Glantser explains what schema binding is and why it’s important:

In SQL Server, when we use the “WITH SCHEMABINDING” clause in the definition of an object (view or function), we bind the object to the schema of all the underlying tables and views. This means that the underlying tables and views cannot be modified in a way that would affect the definition of the schema-bound object. It also means that the underlying objects cannot be dropped. We can still modify those tables or views, as long as we don’t affect the definition of the schema-bound object (view or function).

At one point, I was a lot more gung-ho about schema binding and would do it even for views that didn’t require it, as I liked the idea of protecting the schema from changing. This turned out to be more of an annoyance than a benefit, though, so I quietly modified that belief a while ago. Nowadays, I’ll typically use schema binding when it’s required and on the rare occasion in which I create a function (assuming the function complies).

Comments closed

Fixing Key Lookups

Erik Darling has some advice:

Key lookups represent a choice made by the optimizer between:

  • Clustered index scan
  • Nonclustered index seek/scan + Clustered index lookup

Lookups can be quite useful to avoid creating quite wide nonclustered index, but the optimizer is heavily biased against them. It does not like random I/O.

Read on for Erik’s thoughts on the matter and how you may be able to improve the performance of a query by removing a key lookup—or sometimes, how you may safely ignore a key lookup because hey, it’s a 1 millisecond difference on a query that runs once a day.

Comments closed

Blocking and Waiting in Power BI Import Mode Refreshes

Chris Webb has some ‘splainin to do:

Following on from my previous post showing how you can visualise the job graph for a Power BI Import mode semantic model refresh, I this post I will look at how you can interpret what the job graph tells you – specifically, explaining the concepts of blocking and waiting. As always, simple examples are the best way of doing this.

Click through for the explanation using a job graph.

Comments closed

An Overview of Clustering Techniques in R

Peter Laurinec gives us an overview:

Clustering is a very popular technique in data science because of its unsupervised characteristic – we don’t need true labels of groups in data. In this blog post, I will give you a “quick” survey of various clustering methods applied to synthetic but also real datasets.

Read on for a quick description of what clustering is and a few use cases. Then, Peter dives into a variety of techniques and important things you should know about them. H/T R-Bloggers.

Comments closed

Benchmarking Cumulative Function Speed in TidyDensity

Steven Sanderson charts performance:

Statistical analysis often involves calculating various measures on large datasets. Speed and efficiency are crucial, especially when dealing with real-time analytics or massive data volumes. The TidyDensity package in R provides a set of fast cumulative functions for common statistical measures like mean, standard deviation, skewness, and kurtosis. But just how fast are these cumulative functions compared to doing the computations directly? In this post, I benchmark the cumulative functions against the base R implementations using the rbenchmark package.

Click through for the functions under test and how they fare.

Comments closed

Canceling a Power BI Dataflow Gen2 Refresh

Sandeep Pawar has a script for us:

At the time of writing this blog, it is not possible to cancel a Dataflow Gen2 (DFg2) refresh using the UI. This is a temporary limitation that I expect will be resolved soon. DFg2 can be resource intensive, and if the refresh takes longer than expected, it may consume a significant amount of CUs. Thankfully, you can use the Power BI Rest API to cancel it. My friend Alex Powers already has a PowerShell script that you can use. You can also use the Power BI VS Code extension by Gerhard Brueckl.

But I would like to show you how you can do this using the PowerBIRestClient in the latest version of Semantic-Link (v0.5.0).

Read on to see what this Python script does and how you can use it.

Comments closed

Create and Connect to a Fabric Data Warehouse

Olivier Van Steenlandt builds a warehouse:

In this data recipe series, Microsoft Fabric – Data Warehouse will be explored. As a starting point, a blank Fabric workspace is used. You can sign up for a free Fabric trial by using the following URL: Data Analytics | Microsoft Fabric

In this data recipe, we will create a brand-new Data Warehouse in Fabric. Once created, we will connect to our Data Warehouse using Azure Data Studio.

Click through for the step-by-step process.

Comments closed

Heps, Clustered Indexes, and Non-Clustered Indexes

Erik Darling starts a new series:

Some of the best questions I get some clients, conference attendees, and random email, are about how to design indexes.

A lot of developers out there have a rather foggy picture of exactly how indexes work. They’re all seen phone books, and drawings of B-Tree indexes, but some common things still escape them.

In this post, I’m going to talk about a few things like I’m speaking to someone who has never created a table before.

The problem with the phone book analogy is that there’s an entire generation of people who haven’t used phone books.

Also, Erik has his own spin on the classic NUSE for cluster indexing.

2 Comments