Press "Enter" to skip to content

Day: December 15, 2025

How Data Leakage Can Hurt Model Performance

Ivan Palomares Carrascosa leaks some data:

In this article, you will learn what data leakage is, how it silently inflates model performance, and practical patterns for preventing it across common workflows.

Topics we will cover include:

  • Identifying target leakage and removing target-derived features.
  • Preventing train–test contamination by ordering preprocessing correctly.
  • Avoiding temporal leakage in time series with proper feature design and splits.

Read on to learn more.

Leave a Comment

SQL Server 2025 and Vector Data

Tomaz Kastrun continues a series on SQL Server 2025 with several posts on vector data. First up is the new vector data type:

The vector data type is designed to store vector data optimized for operations such as similarity search and machine learning applications. Vectors are stored in an optimized binary format but are exposed as JSON arrays for convenience.

Implicit and explicit conversion from and to the vector type can be done using varcharnvarchar, and json types.

Second is information on vector functions:

Yesterday we looked into Vector data type and how to create table, insert the vector and read it. With SQL Server 2025, vector data type comes equipped also with couple of functions:

And third is how to generate embeddings and store the results in SQL Server:

AI_GENERATE_EMBEDDINGS is a built-in function that creates embeddings (vector arrays) using a pre-created AI model definition stored in the database.

Before running, we need to register the model; creating the master key, database scope credentials and Creating external model.

Leave a Comment

DAX Lib: Shared DAX User-Defined Functions

Marco Russo shares some code:

Three months ago, Microsoft introduced the User-Defined Functions (UDFs) in the DAX language. From the first day, https://daxlib.org has been available to share libraries of functions with the Power BI community. We published DAX Lib with a low profile because we did not have many libraries available at the beginning, but now it is time to spread the word!

Using DAX Lib is fast and simple: copy the function code from a TMDL script in DAX Lib, then paste it into the TMDL view of your Power BI model and apply it. Watch the video to see a complete walkthrough.

Check out that video, as well as the functions available in the “DAX app store.”

Leave a Comment

Refactoring SQL Code

Steve Jones shares some thoughts:

I was thinking about this when I saw this article on strategies to refactor sql code. The article seems written more for PostgreSQL, but there are items that relate to T-SQL as well. The main thrust of the article is about trying to rewrite code to DRY (don’t repeat yourself). The more changes you can make to shrink code, either to make it easier to read or avoid repeating those copy/paste items, the better off your team will be. It’s easy to think those copies aren’t a big deal, but it’s easy to update code in one place because that solves the problem you were given, and forget to fix all the copies.

Strict refactoring—leaving the inputs and outputs alone and only modifying the structure of code beyond the scope of reformatting but without changing its behavior—is somewhat uncommon in T-SQL outside of performance tuning scenarios, at least in my experience. The problem I have with DRY, when it comes to T-SQL, is that you generally need to pay the performance piper. Yes, repeating the contents of a common function in a series of T-SQL queries is repetition and “wasteful” in that regard, but if it makes the queries run literally 3-9x faster just from making these changes, I don’t care. I’ll do it.

If T-SQL were an idealized implementation of a fourth-generation language, where all viable equivalent queries would have the same execution plan and thus the same performance, then we’d see a lot more code refactoring because the way we write the code would not have a direct impact on how it runs. But in practice, that’s not the case.

Leave a Comment

What’s New in SSIS for SQL Server 2025

Chunhua Gu says, not much:

Security is a top priority for SSIS 2025, reflecting the broader enterprise’s focus on data protection and compliance. Microsoft.Data.SqlClient provides a modern, secure data access layer. This new provider supports advanced security protocols, including TLS 1.3 for encrypted connections, and integrates seamlessly with Microsoft Entra ID (formerly Azure Active Directory) for robust authentication.

In short, support the new-ish library (that has been around for several years), tie in with Microsoft Fabric, remove functionality that used to be in the product while spinning this as a grand new opportunity for developers to spend money on Fabric, and that’s it. Granted, SSIS hasn’t been a proper focus for the product since 2012 (sorry, Hadoop components in 2016—you’re out of the product now, so you don’t count), so all of this should come at no surprise.

Leave a Comment

What VACUUM Really Does in Postgres

Radim Marek explains:

There is common misconception that troubles most developers using PostgreSQL: tune VACUUM or run VACUUM, and your database will stay healthy. Dead tuples will get cleaned up. Transaction IDs recycled. Space reclaimed. Your database will live happily ever after.

But there are couple of dirty “secrets” people are not aware of. First of them being VACUUM is lying to you about your indexes.

Click through to learn more.

Leave a Comment