Press "Enter" to skip to content

Curated SQL Posts

DAX in Review: 2023

Marco Russo gives us a review:

Despite the large number driven by the new INFO functions, we did not see big changes in DAX, but rather a consolidation of the new window functions.

After three years, in 2023 the composite models reached the general availability! More important, the official name is now “composite models”, dropping the previous “DirectQuery for Power BI datasets and Analysis Services” name used in preview. Add to this that “datasets” have been renamed to “semantic models” (a change we fully support at SQLBI), and it seems this year we made peace with the Microsoft naming departments (there should be more than one).

Click through for more on DAX, as well as what Marco and Alberto have been up to this year.

Comments closed

Monitoring Checkpoints and the Background Writer in Postgres

Muhammad Ali keeps an eye on things:

In PostgreSQL, a checkpoint is an operation during which the database flushes/syncs all pending modifications(dirty buffers) inside memory to the actual data files on the disk.

This is important for two primary reasons. Firstly, it guarantees that all committed transactions are permanently stored, thereby safeguarding against data loss in the event of a system failure. Secondly, it works closely with the database recovery mechanism. If a crash occurs, PostgreSQL begins processing WAL logs starting from the last successful checkpoint(It gets this information from the pg_control file located in the PG data directory) during recovery. Additionally, this process allows for the fine-tuning of performance through a variety of parameters, adaptable to specific workload requirements which are discussed below.

Read on to learn more about how checkpoints work in Postgres, how the background writer works, and things to keep in mind.

Comments closed

Dynamic SQL in Stored Procedures

Erik Darling gets serious:

I’ve spent a lot of time on this blog telling you different reasons why you should, when you should, and how you should use dynamic SQL.

In this post, I’m going to go into some of the finer points of how I approach dynamic SQL to avoid issues — not performance issues — more procedural issues.

Things you should use to protect yourself from wonky object names, string truncation, object identification, and more.

Read on for Erik’s tips for handling dynamic SQL.

Comments closed

What’s New in .NET 8

Thao Nguyen keeps us up to date on .NET:

Microsoft has announced .NET 8 recently. It emphasized the cloud, performance, full-stack Blazor and .NET MAUI as major highlights of the latest edition of the company’s free, cross-platform, open-source developer platform.

Of special importance to enterprises, .NET 8 is a long-term support (LTS), which means it will be supported and patched for three years as opposed to 18 months for a standard term support (STS) release.

Click through for the list, as it’s good to keep at least one eye on the developers lest they run around unopposed.

Comments closed

Creating a Time Series in R

Steven Sanderson says it’s time:

The ts() function in R is a fundamental tool for handling time series data. It takes four main arguments:

  1. data: A vector or matrix of time series values.
  2. start: The time of the first observation.
  3. end: The time of the last observation.
  4. frequency: The number of observations per unit of time.

Read on for an example of how this all works, as well as a function in the TidyDensity package to convert data into the R time series format.

Comments closed

Eager Aggregation in SQL Queries

Boris Novikov talks about an uncommon topic:

In this article we discuss one type of query transformation that most optimizers do not use. Because of this, it can be beneficial for you to rewrite a query to help the optimizer order operations in a way that can be beneficial.

An analytical query is supposed to produce some kind of summary generalizing properties of huge amounts of data but at the same time should be compact and easy for humans to understand. In terms of the SQL query language this means that any analytical query extracts and combines large number of rows and then uses aggregate functions with or even without GROUP BY clause. More specifically, we consider queries that contain many JOIN operations followed by aggregation. Usually, queries are written in this way and, surprisingly, the optimizers choose the best order of joins but leave the aggregation as the last step.

Read on for more information, including a minor lamentation that the various relational database optimizers tend not to perform this kind of operation. In SQL Server, I have an example of this pre-aggregation using the APPLY operator (with demo code here) and a simple but realistic example of how drastic the savings can be.

Comments closed

Distributed Data in Postgres

Umair Shahid explains how postgres_fdw works:

Bridging Data Silos and Accelerating Insights

In today’s data-driven world, organizations often grapple with data residing in multiple, disparate databases. This fragmentation can hinder seamless analysis and decision-making. However, PostgreSQL offers a powerful tool to address this challenge: postgres_fdw.

What is postgres_fdw?

postgres_fdw, short for PostgreSQL Foreign Data Wrapper, is a built-in extension that allows you to seamlessly access and query data stored in external PostgreSQL databases as if it were local to your current database. This means you can create views, join tables, and perform complex queries across multiple databases without the need for manual data integration or replication.

Read on for more information about the extension.

Comments closed

Options for Forcing Parallelism

Chad Callihan looks at a pair of options:

Just because something works doesn’t mean it’s the right thing to do. I had that type of challenge to my database morals recently when facing a query that refused to go parallel.

Read on to learn more. Note that neither of these relates to MAXDOP because that doesn’t determine whether a plan will go parallel (though you can use it to prevent a plan from going parallel).

Comments closed

Wrapper Stored Procedures

Erik Darling offers some advice:

Wrapper stored procedures are useful for things like:

  • Transforming declared local variables into parameters
  • Preventing code from compiling when it isn’t used
  • Generating different query plans to deal with parameter sniffing

The upside of using this over dynamic SQL is that you have a convenient object name attached to the code.

Read on for the downside to this, as well as a pair of videos on the topic.

Comments closed

Non-Equi Joins in data.table

John MacKintosh wants to join on a greater than or less than operation:

For day 5, I had to create a function, and I’m writing this up, because it’s an example of a non-equi join between two tables.
In this particular sitation, there are are no common columns between the two tables, so my usual data.table hack of copying the columns of interest, renaming themjoin_col, and then keying them both does not work.

Click through for a working solution.

Comments closed