Press "Enter" to skip to content

Curated SQL Posts

Using TimescaleDB in Postgres

Adron Hall put a database in his database:

I’ve been using TimescaleDB for time-series data on and off for a while now. I recently fired up Postgres.app for local development. It’s one of the cleanest ways to get PostgreSQL running on macOS, and adding TimescaleDB is surprisingly straightforward once you know where to look.

Time-series data is everywhere—sensor readings, application metrics, user events, IoT data. Regular PostgreSQL can handle it, but once you’re dealing with millions of rows, you’ll notice queries slowing down. TimescaleDB solves this by turning your time-series tables into hypertables that automatically partition by time, compress old data, and optimize queries. The best part? It’s still PostgreSQL, so all your existing tools and SQL knowledge work exactly the same.

Read on for a demo of how it all works.

Leave a Comment

Postgres Index Corruption after an OS Upgrade

Laurenz Albe has me concerned:

Most major Linux distributions have upgraded to the GNU C library version 2.28 or later. Therefore, there is a growing awareness that an operating system upgrade can lead to index corruption. However, rebuilding all your indexes can slow down the upgrade process considerably. In this article, I want to discuss how you can avoid rebuilding more indexes than necessary.

Read the whole thing if you are a Postgres DBA.

Leave a Comment

Monitoring Replication in Redgate Monitor

Steve Jones shows off some functionality:

The monitoring capabilities in Redgate Monitor were originally fairly limited to a few counters from PerfMon. A few people had written custom metrics on sqlmonitormetrics.com that clients could use, but we’ve had customers asking for more native integrations.

We’ve done it. With version 14.2, we have added an estate view of your replication environment. In the Estate menu, there is a new entry for Replication Monitoring.

I had to check the documentation to see if merge replication was included, and it was. That’s usually the form that people avoid because it’s too hard to implement.

In practice, this doesn’t replace ReplMon or more detailed mechanisms for detailing system behavior, but it does at least provide the ability to alert if things are going wrong with the SQL Agent jobs. That probably catches about 60-70% of issues with replication, with the remaining 30-40% requiring some sort of time of flight indicator to see if data on the subscribers are keeping up to date with data on the publisher.

Leave a Comment

Filtering RPC_COMPLETED Extended Events

Kendra Little builds a filter:

The rpc_completed event in Extended Events is useful when troubleshooting SQL Server performance. It captures detailed information about Remote Procedure Calls: that means stored procedure executions, including the calls to sp_executesql often used by applications (including Entity Framework) to run parameterized queries against SQL Server. The output for rpc_completed includes the parameters that were specified along with values provided, and the CPU time, logical reads, and duration used by the query.

It can be frustrating to figure out how to filter this in Extended Events. Struggling with this is one of the primary reasons I sometimes use ye Olde Profiler for initial investigations and to speedily observe something in SQL Server.

Here is my survival guide to filtering rpc_completed, which makes using XEvents suck less.

Kendra digs in deep, including several common gotchas and solid recommendations, including the use of Erik Darling’s sp_humanevents.

Leave a Comment

Creating a SQL Agent Job via SSMS

Jim Evans creates a job:

In this article, I’ll show how to create and schedule a SQL Server Agent Job. I’ll also show how to setup an Operator to receive notifications for failed or successful Job completions.

Jim lays out the UI-based approach and has scripting the job as an optional step. I highly recommend scripting and understanding the T-SQL it generates. It may look like a mess at first and the T-SQL it outputs is not idempotent (meaning, if you re-run the script multiple times, you do not end up with the same outcome and a successful run of the script). But changing this to become an idempotent template that successfully completes each time will allow you to store the code in source control and also build out future jobs that much faster. Additionally, it prevents issues where you have the “same” job but it’s actually set up differently across your different SQL Server instances.

Leave a Comment

Asynchronous Disk I/O in Postgres 18

Josef Machytka gives us the skinny:

PostgreSQL 17 introduced streaming I/O – grouping multiple page reads into a single system call and using smarter posix_fadvise() hints. That alone gave up to ~30% faster sequential scans in some workloads, but it was still strictly synchronous: each backend process would issue a read and then sit there waiting for the kernel to return data before proceeding. Before PG17, PostgreSQL typically read one 8kB page at a time.

PostgreSQL 18 takes the next logical step: a full asynchronous I/O (AIO) subsystem that can keep multiple reads in flight while backends keep doing useful work. Reads become overlapped instead of only serialized. The AIO subsystem is deliberately targeted at operations that know their future block numbers ahead of time and can issue multiple reads in advance:

Read on to see some of the consequences of this change, as well as more detail on how it works.

Leave a Comment

How Data Leakage Can Hurt Model Performance

Ivan Palomares Carrascosa leaks some data:

In this article, you will learn what data leakage is, how it silently inflates model performance, and practical patterns for preventing it across common workflows.

Topics we will cover include:

  • Identifying target leakage and removing target-derived features.
  • Preventing train–test contamination by ordering preprocessing correctly.
  • Avoiding temporal leakage in time series with proper feature design and splits.

Read on to learn more.

Leave a Comment

SQL Server 2025 and Vector Data

Tomaz Kastrun continues a series on SQL Server 2025 with several posts on vector data. First up is the new vector data type:

The vector data type is designed to store vector data optimized for operations such as similarity search and machine learning applications. Vectors are stored in an optimized binary format but are exposed as JSON arrays for convenience.

Implicit and explicit conversion from and to the vector type can be done using varcharnvarchar, and json types.

Second is information on vector functions:

Yesterday we looked into Vector data type and how to create table, insert the vector and read it. With SQL Server 2025, vector data type comes equipped also with couple of functions:

And third is how to generate embeddings and store the results in SQL Server:

AI_GENERATE_EMBEDDINGS is a built-in function that creates embeddings (vector arrays) using a pre-created AI model definition stored in the database.

Before running, we need to register the model; creating the master key, database scope credentials and Creating external model.

Leave a Comment

DAX Lib: Shared DAX User-Defined Functions

Marco Russo shares some code:

Three months ago, Microsoft introduced the User-Defined Functions (UDFs) in the DAX language. From the first day, https://daxlib.org has been available to share libraries of functions with the Power BI community. We published DAX Lib with a low profile because we did not have many libraries available at the beginning, but now it is time to spread the word!

Using DAX Lib is fast and simple: copy the function code from a TMDL script in DAX Lib, then paste it into the TMDL view of your Power BI model and apply it. Watch the video to see a complete walkthrough.

Check out that video, as well as the functions available in the “DAX app store.”

1 Comment

Refactoring SQL Code

Steve Jones shares some thoughts:

I was thinking about this when I saw this article on strategies to refactor sql code. The article seems written more for PostgreSQL, but there are items that relate to T-SQL as well. The main thrust of the article is about trying to rewrite code to DRY (don’t repeat yourself). The more changes you can make to shrink code, either to make it easier to read or avoid repeating those copy/paste items, the better off your team will be. It’s easy to think those copies aren’t a big deal, but it’s easy to update code in one place because that solves the problem you were given, and forget to fix all the copies.

Strict refactoring—leaving the inputs and outputs alone and only modifying the structure of code beyond the scope of reformatting but without changing its behavior—is somewhat uncommon in T-SQL outside of performance tuning scenarios, at least in my experience. The problem I have with DRY, when it comes to T-SQL, is that you generally need to pay the performance piper. Yes, repeating the contents of a common function in a series of T-SQL queries is repetition and “wasteful” in that regard, but if it makes the queries run literally 3-9x faster just from making these changes, I don’t care. I’ll do it.

If T-SQL were an idealized implementation of a fourth-generation language, where all viable equivalent queries would have the same execution plan and thus the same performance, then we’d see a lot more code refactoring because the way we write the code would not have a direct impact on how it runs. But in practice, that’s not the case.

Leave a Comment