August 2024 – Curated SQL

Composite Indexes in MySQL

Published 2024-08-30 by Kevin Feasel

Lukas Vileikis needs more than one column:

Indexes in MySQL are one of the primary ways to enhance query performance and they are especially useful when the primary use case of our project refers to reading data stored in a database. We‘ve already told you about the nuanced world of indexes in MySQL – and there we‘ve mentioned that MySQL has multiple types of SQL indexes available for you to choose from.

The primary index type available within MySQL is the B-Tree index which we‘ve already covered in one of our previous articles. If you know your way around MySQL though, you will certainly be aware of other nuances of indexes, too – and one of those nuances has to do with the fact that B-Tree indexes can also consist of multiple columns (commonly referred to as composite indexes). In this example, we’re using MariaDB as our DBMS of choice. Percona Server for MySQL and MySQL Server will act identically.

Find the queries recreating the table structure and composite indexes in the appendix, and let’s get started.

Read on to see how composite indexes work in MySQL. On the whole, it’s quite similar to how they work in SQL Server, though it’s interesting to catch the differences at the edges.

Comments closed

Computed Columns and Wide Index Updates

Published 2024-08-30 by Kevin Feasel

Paul White takes us through a performance scenario:

Update execution plans are not something the T-SQL statement writer has much control over. You can affect the data reading side of the plan with query rewrites and hints, but there’s not nearly as much tooling available to affect the writing side of the plan.

Update processing can be extremely complex and reading data-changing execution plans correctly can also be difficult. Many important details are hidden away in obscure and poorly documented properties, or simply not present at all.

In this X article, I want to you show a particularly bad update plan example. It has value in and of itself, but it will also give me a chance to describe some less well-known SQL Server details and behaviours.

Read on for the full story.

Comments closed

Understanding the EXISTS Keyword in SQL

Published 2024-08-30 by Kevin Feasel

Eric Blinn probably exists:

I’ve seen the EXISTS keyword in Microsoft SQL Server T-SQL code and don’t understand it well. What does it do? How do I use it? Are there best practices around SQL EXISTS?

This SQL tutorial will explain what the keyword EXISTS does and show several different use cases.

Read on to see how you can use EXISTS and its complement, NOT EXISTS, in a variety of use cases. One important part of why EXISTS can be useful compared to other ways of writing a particular query is that the performance profile of an EXISTS clause is a semi-join: we proceed until we find the first result matching our clause. If that happens to be in the first row, we can stop there as we’ve fulfilled the requirement. By contrast, an alternative using IN or something else like using SELECT COUNT(*) would likely need to read more pages of data than EXISTS.

Comments closed

Planning Statistics Updates

Published 2024-08-30 by Kevin Feasel

Andy Mallon makes the case:

Let’s talk about the trade-offs.

Auto-updated stats are triggered by queries needing to access a table with “stale” stats. (It doesn’t matter how we define stale, so lets not think about it.) With the exception of very small tables, auto-updated stats use a random small data sample of the table data; in contrast scheduled jobs that usually have a larger sample size defined—I always say “FULLSCAN if you can!”

Read on for Andy’s thesis.

Comments closed

From Anaconda to Standard Python

Published 2024-08-29 by Kevin Feasel

Rob Zelt switches to standard Python:

While Anaconda provides a comprehensive package management system, particularly useful for data science, many developers prefer the flexibility and lightweight nature of standard Python environments. This guide will help you make the switch without losing your carefully curated package setup.

Read on for Rob’s solution. I used to be a huge proponent of the Anaconda distribution of Python, but have found myself being less of one, especially with the licensing changes a few years back. If you were already using pip for most package installation, and if you’re fairly consistent about using virtual environments, this transition is even easier than in Rob’s scenario.

Comments closed

Displaying Dates without Data in Power BI

Published 2024-08-29 by Kevin Feasel

Alon Ohayon looks for the missing month:

When you’re using Power BI, you probably create line charts that show data by month pretty often. It usually works great, but what happens if some months don’t have any data?

By default, Power BI just skips those months in the chart. That might seem okay, but it can actually be misleading—especially if you’re looking for trends over time, including the months with no activity.

Alon shows us an example of how to do this in DAX and that works. But if you can solve this at the data layer, such as when querying from a SQL Server, that’s even better. This happens to be one of the good uses of a calendar table: giving you a complete set of months (or whatever time period you want) that you can then use to left join to your data, returning either a data point with a value, or a NULL that you can coalesce with 0 to ensure that you have a result for each month.

Comments closed

The Importance of Filtering Columns instead of Tables in DAX

Published 2024-08-29 by Kevin Feasel

Marco Russo and Alberto Ferrari explain it to us:

You have probably heard multiple times the same answer to many questions about DAX and, to be honest, about nearly anything IT-related: “it depends”. Is it fine to create a calculated column, or is it better to avoid it? Yes, no… it depends. Is it better to create one, two, or three fact tables? It depends. Is a composite model the right choice? It depends. However, a few questions have a strong, clear answer, and in this article, we focus on one of those. Is it better to filter a table or a column with CALCULATE? Here, the answer is simple and definite: filter columns, not tables. The same principle is applied to CALCULATETABLE, even though the example in this article only shows CALCULATE.

With that answer in mind, read on for the wherefore.

Comments closed

Discovering Nested References in SQL Server

Published 2024-08-29 by Kevin Feasel

Tom Zika is helping:

The original question lists these requirements:

Information about all the tables referenced by a stored procedure

Also, list tables from cross-DB query

Return DB, Schema and Table names

Read on for a script to set up the example and then a query to build out these references.

Comments closed

Creating Test Classes and Unit Tests with tSQLt

Published 2024-08-29 by Kevin Feasel

Olivier Van Steenlandt continues a series on database testing:

We have set up our tSQLt Database Project in the previous data recipe, Create a SSDT Project Template based on your Database Project. Now it’s time to dive into the wonderful world of tSQLt Unit Testing. In the meantime, I have added my data warehouse to my SSDT Solution and added this project as a Database Reference to my Unit Testing Database Project. If you are unsure how to do this, you can find all the information you need in my previous data cookbook which you can access via the following link: Getting Started With Database Projects & Azure DevOps.

Read on for a walkthrough of how to do this.

Comments closed

Parallelism in Powershell Workflows

Published 2024-08-29 by Kevin Feasel

Chad Callihan shows one method for parallel execution in Powershell:

I’ve mentioned before that running Copy-DbaDatabase worked well in my experience, but the downside that I ran into was utilizing it for larger counts of databases. It should be no surprise that the more databases to copy, the longer the process takes to complete. I did some more research and came across using workflows along with the parallel keyword to speed up the copy process.

Let’s take a look at what a workflow is and how we can apply it to add parallelism to the database copy process.

I’ve previously covered options for parallelism, though apparently I was wrong about workflows being deprecated. Rod Edwards also has a list of options for doing things in parallel and does not include workflows, so you can get an idea of several of the options available to you.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Month: August 2024