Press "Enter" to skip to content

Curated SQL Posts

Postgres Data Extraction with LATERAL joins and More

Ryan Booz extracts some data:

In our data hungry world, knowing how to effectively load and transform data from various sources is a highly valued skill. Over the last couple of years, I’ve learned how useful many of the data manipulation functions in PostgreSQL can supercharge your data transformation and analysis process, using just PostgreSQL and SQL.

For the last couple of decades, “Extract Transform Load” (ETL) has been the primary method for manipulating and analyzing the results. In most cases, ETL relies on an external toolset to help acquire different forms of data, slicing and dicing it into a form suitable for relational databases, and then inserting the results into your database of choice. Once it’s in the destination table with a relational schema, querying and analyzing it is much easier.

I call out CROSS JOIN LATERAL (or any kind of lateral join) here because it’s the ANSI equivalent of T-SQL’s APPLY operator, and I’ve already pointed out once today that I’m a huge fan of APPLY.

Comments closed

Overloading Power BI in Microsoft Fabric

Reitse Eskens pushes the envelope:

In my previous blog on Fabric and loadtesting, I ended with not really knowing how PowerBI would respond to all these rows. After creating and presenting a session on this subject, it’s time to dig into this part of Fabric as well. There were questions and I made promises. So here goes! This blog will only show the F2 experience as that’s where things went off the road. And, as I’ve shown in the previous blog, the CU count doesn’t change between SKU’s, only the amount of SKU’s available changes.
This blog isn’t meant to scold Fabric or make it look silly, I’m the one who’s silly. The goal is to show some limitations, a way you can do some load testing and help you find your way in the available metrics.

Read on to see what Reitse has gotten into.

Comments closed

Using the APPLY Operator

Erik Darling gets an auto-link for talking about my favorite operator:

I end up converting a lot of derived joins, particularly those that use windowing functions, to use the apply syntax. Sometimes good indexes are in place to support that, other times they need to be created to avoid an Eager Index Spool.

One of the most common questions I get is when developers should consider using apply over other join syntax.

The short answer is that I start mentally picturing the apply syntax being useful when:

To learn when, you’re going to have to read the whole thing. And, if you want to learn even more about it, I have a talk on the topic that might be of interest.

Comments closed

Postgres and NUMA

Annie Ghazali follows up on a Chris Travers webinar:

Q1. At what point we need to focus on ensuring huge_pages in PostgreSQL?

There are a couple of factors here. The first is, that if you’re able to show that you have multiple NUMA domains, it will almost always be a win performance-wise. But it becomes critical at the point where you start seeing that the checkpointer is running at 100 percent CPU load, and none of your queries are running at 100 percent CPU load, especially if you don’t have a lot of IO weight. That’s a really good indication that you’ve hit a point where it’s now a heavy bottleneck, and that’s a point where it’s starting to become something where you’re going to see a very large win out of it. 

Read on to see this full answer, as well as answers to questions around why you might not want to disable NUMA support and what NUMA does to swap space recommendations.

Comments closed

Pulling Samples in R with sample()

Steven Sanderson takes a sample:

The sample() function in R is a powerful tool that allows you to generate random samples from a given dataset or vector. It’s an essential function for tasks such as data analysis, Monte Carlo simulations, and randomized experiments. In this blog post, we’ll explore the sample() function in detail and provide examples to help you understand how to use it effectively.

Read on to see what options are available with sample() and the different ways in which you can use the function.

Comments closed

Issues and Projects in GitHub

I have a new video:

In this video, we take a look at what GitHub has for project management, reviewing GitHub Projects and Issues.

The upshot is that GitHub has a fair amount of capability for project management. Its notion of Issues definitely feels fairly well fleshed out, which makes sense considering GitHub’s original purpose as a storehouse for open-source code repositories. By contrast, Projects are a relatively new feature and there’s still some room to grow there, especially if you’re used to project management tools like Jira or Trello.

Comments closed

Hash Match and Stream Aggregate Operators in SQL Server

Andy Brownsword rounds up the usual suspects:

In the last post we looked at how TOP and MAX operators compared. We saw the execution plan for a MAX function used a Stream Aggregate operator which is one of two which we can use for aggregation

I wanted to look at the two operators and how they perform the same tasks in different ways. The way they function is key to understanding why the engine may choose to use one over the other and the impact this can have on the performance of a query.

The two operators in question: the Hash Match (Aggregate) and Stream Aggregate

Read on for a discussion of how each operator works and when each makes sense for the optimizer to use.

Comments closed

Tips on Using Subqueries in the SELECT Clause

Erik Darling covers a “sometimes” topic:

I think subqueries in select lists are very neat things. You can use them to skip a lot of additional join logic, which can have all sorts of strange repercussions on query optimization, particularly if you use have to use left joins to avoid eliminating results.

Subqueries do have their limitations:

  • They can only return one row
  • They can only return one column

But used in the right way, they can be an excellent method to retrieve results without worrying about what kind of join you’re doing, and how the optimizer might try to rearrange it into the mix.

Read on for a dive into this topic and a scenario in which subqueries in the SELECT clause can be faster than alternatives. My personal preference is, unless there’s a major performance difference, I’d rather have the SELECT clause be as simple as possible. But sometimes, the difference is stark enough to matter.

Comments closed

Setting up Replication with dbatools

Jess Pomfret continues a series on replication in dbatools:

This post is focusing on how to setup replication with dbatools. We support all three flavours – snapshottransactional and even merge replication!

In this article I’ll be creating a transactional publication, but the steps for setup are very similar no matter which flavour you’re implementing.

I’ll walk through and demonstrate all the steps to setup replication in this article as dbatools allows us to complete them all. However, I won’t go into a lot of details on why or how replication works, or provide guidance on best practices. If you need more information on replication as a technology then I recommend visiting the Microsoft Docs.

Read on for a demonstration of how the cmdlets work for adding a publication, articles, subscriptions, and more.

Comments closed

Migrating Always Encrypted to a Secure Enclave

Pieter Vanhove has an enclave, which is sort of like a Bat-cave:

Always Encrypted is a feature of Azure SQL and SQL Server that allows you to encrypt sensitive data in your database. The data is never exposed in plaintext to the database engine, or anyone who has access to it.

However, Always Encrypted has some limitations. For example, you cannot perform any computations or operations on the encrypted data, such as sorting, filtering, or indexing. Secondly, an initial encryption must be done on the application side which can be time consuming on a large set of data.

That’s where Always Encrypted with secure enclaves comes in.

Read on to see what secure enclaves give you, as well as how you can enable it and what changes your application might require.

Comments closed