Press "Enter" to skip to content

Month: December 2025

Creating a SQL Agent Job via SSMS

Jim Evans creates a job:

In this article, I’ll show how to create and schedule a SQL Server Agent Job. I’ll also show how to setup an Operator to receive notifications for failed or successful Job completions.

Jim lays out the UI-based approach and has scripting the job as an optional step. I highly recommend scripting and understanding the T-SQL it generates. It may look like a mess at first and the T-SQL it outputs is not idempotent (meaning, if you re-run the script multiple times, you do not end up with the same outcome and a successful run of the script). But changing this to become an idempotent template that successfully completes each time will allow you to store the code in source control and also build out future jobs that much faster. Additionally, it prevents issues where you have the “same” job but it’s actually set up differently across your different SQL Server instances.

Leave a Comment

Asynchronous Disk I/O in Postgres 18

Josef Machytka gives us the skinny:

PostgreSQL 17 introduced streaming I/O – grouping multiple page reads into a single system call and using smarter posix_fadvise() hints. That alone gave up to ~30% faster sequential scans in some workloads, but it was still strictly synchronous: each backend process would issue a read and then sit there waiting for the kernel to return data before proceeding. Before PG17, PostgreSQL typically read one 8kB page at a time.

PostgreSQL 18 takes the next logical step: a full asynchronous I/O (AIO) subsystem that can keep multiple reads in flight while backends keep doing useful work. Reads become overlapped instead of only serialized. The AIO subsystem is deliberately targeted at operations that know their future block numbers ahead of time and can issue multiple reads in advance:

Read on to see some of the consequences of this change, as well as more detail on how it works.

Leave a Comment

How Data Leakage Can Hurt Model Performance

Ivan Palomares Carrascosa leaks some data:

In this article, you will learn what data leakage is, how it silently inflates model performance, and practical patterns for preventing it across common workflows.

Topics we will cover include:

  • Identifying target leakage and removing target-derived features.
  • Preventing train–test contamination by ordering preprocessing correctly.
  • Avoiding temporal leakage in time series with proper feature design and splits.

Read on to learn more.

Leave a Comment

SQL Server 2025 and Vector Data

Tomaz Kastrun continues a series on SQL Server 2025 with several posts on vector data. First up is the new vector data type:

The vector data type is designed to store vector data optimized for operations such as similarity search and machine learning applications. Vectors are stored in an optimized binary format but are exposed as JSON arrays for convenience.

Implicit and explicit conversion from and to the vector type can be done using varcharnvarchar, and json types.

Second is information on vector functions:

Yesterday we looked into Vector data type and how to create table, insert the vector and read it. With SQL Server 2025, vector data type comes equipped also with couple of functions:

And third is how to generate embeddings and store the results in SQL Server:

AI_GENERATE_EMBEDDINGS is a built-in function that creates embeddings (vector arrays) using a pre-created AI model definition stored in the database.

Before running, we need to register the model; creating the master key, database scope credentials and Creating external model.

Leave a Comment

DAX Lib: Shared DAX User-Defined Functions

Marco Russo shares some code:

Three months ago, Microsoft introduced the User-Defined Functions (UDFs) in the DAX language. From the first day, https://daxlib.org has been available to share libraries of functions with the Power BI community. We published DAX Lib with a low profile because we did not have many libraries available at the beginning, but now it is time to spread the word!

Using DAX Lib is fast and simple: copy the function code from a TMDL script in DAX Lib, then paste it into the TMDL view of your Power BI model and apply it. Watch the video to see a complete walkthrough.

Check out that video, as well as the functions available in the “DAX app store.”

1 Comment

Refactoring SQL Code

Steve Jones shares some thoughts:

I was thinking about this when I saw this article on strategies to refactor sql code. The article seems written more for PostgreSQL, but there are items that relate to T-SQL as well. The main thrust of the article is about trying to rewrite code to DRY (don’t repeat yourself). The more changes you can make to shrink code, either to make it easier to read or avoid repeating those copy/paste items, the better off your team will be. It’s easy to think those copies aren’t a big deal, but it’s easy to update code in one place because that solves the problem you were given, and forget to fix all the copies.

Strict refactoring—leaving the inputs and outputs alone and only modifying the structure of code beyond the scope of reformatting but without changing its behavior—is somewhat uncommon in T-SQL outside of performance tuning scenarios, at least in my experience. The problem I have with DRY, when it comes to T-SQL, is that you generally need to pay the performance piper. Yes, repeating the contents of a common function in a series of T-SQL queries is repetition and “wasteful” in that regard, but if it makes the queries run literally 3-9x faster just from making these changes, I don’t care. I’ll do it.

If T-SQL were an idealized implementation of a fourth-generation language, where all viable equivalent queries would have the same execution plan and thus the same performance, then we’d see a lot more code refactoring because the way we write the code would not have a direct impact on how it runs. But in practice, that’s not the case.

Leave a Comment

What’s New in SSIS for SQL Server 2025

Chunhua Gu says, not much:

Security is a top priority for SSIS 2025, reflecting the broader enterprise’s focus on data protection and compliance. Microsoft.Data.SqlClient provides a modern, secure data access layer. This new provider supports advanced security protocols, including TLS 1.3 for encrypted connections, and integrates seamlessly with Microsoft Entra ID (formerly Azure Active Directory) for robust authentication.

In short, support the new-ish library (that has been around for several years), tie in with Microsoft Fabric, remove functionality that used to be in the product while spinning this as a grand new opportunity for developers to spend money on Fabric, and that’s it. Granted, SSIS hasn’t been a proper focus for the product since 2012 (sorry, Hadoop components in 2016—you’re out of the product now, so you don’t count), so all of this should come at no surprise.

Leave a Comment

What VACUUM Really Does in Postgres

Radim Marek explains:

There is common misconception that troubles most developers using PostgreSQL: tune VACUUM or run VACUUM, and your database will stay healthy. Dead tuples will get cleaned up. Transaction IDs recycled. Space reclaimed. Your database will live happily ever after.

But there are couple of dirty “secrets” people are not aware of. First of them being VACUUM is lying to you about your indexes.

Click through to learn more.

Leave a Comment

Choosing a Vector Database

Joe Sack has some advice:

Vector search has become a standard approach for semantic search and RAG. Whether you’re evaluating a dedicated vector database, SQL Server 2025, a Postgres extension like pgvector, or an in-memory library, there are certain production realities worth planning for.

Admittedly, my vector database decision boiled down to “What can I actually get to work in my non-internet-connected on-premises environment where everything is locked down to the point that bringing in new software is a major hassle?” That quickly narrowed down the set of viable options.

Leave a Comment