Press "Enter" to skip to content

Day: December 23, 2025

Vector Search: Negation and Cross-Encoding

Joe Sack digs into a common problem in vector search. First up is a description of the problem:

I embedded two queries: “home with pool” and “home without pool.” The cosine similarity was 0.82. The embedding model treats negated queries as nearly identical to their positive counterparts.

For comparison, completely unrelated queries (“home with pool” vs “quarterly earnings report”) scored 0.13.

And one imperfect solution:

In my last post, I showed that vector search treats “home with pool” and “home without pool” as nearly identical (0.82 similarity). Bi-encoders struggle with negation.

Cross-encoders can help with this. But there’s a trade-off.

Read on to learn how cross-encoders can help, but they come at a significant cost. Joe also describes a pattern that can minimize the total pain level when using cross-encoders.

Leave a Comment

Thoughts on Parallel Programming in T-SQL

Greg Low shares some thoughts:

Upcoming processors are likely to have even more cores than now. Have you ever tried to write multiprocessor-style code? A friend of mine recently said that he learned some of this style of coding but later when he came back to it, he realised how much he thought he knew but didn’t.

For languages like T-SQL, we don’t have inherent support for multi-threading. In fact, the only trace I can see of this in T-SQL today is the ability to have multiple readers on a service broker queue.

In general, we haven’t needed this because SQL Server systems constantly have many requests thrown at them concurrently anyway and there is a natural style of parallelism happening.

I’d take it one step further. T-SQL, as a reasonable attempt at a 4th-generation programming language, abstracts away the need to define what should or should not be parallel. That’s the job of the database engine. We tell it what the end result should look like and let the engine figure out the details.

I do like the idea that Greg mentions of running stored procedures asynchronously. That’s something we typically need a separate programming language and some calling code to implement. Either that or a larger number of SSMS tabs.

Leave a Comment

Text Features in SQL Server 2025

Tomaz Kastrun continues an advent of SQL Server 2025. Day 22 looks at the UNISTR() function:

UNISTR() function is a new T-SQL function in SQL Server 2025. It will help you with unicode string literals (e.g.: special characters, emoji, special language alphabets and others) by letting you specify the unicode encoding value of characters in the string.

Difference between NCHAR and UNISTR is that latter will provide more flexibility and ways of handling multiple unicode characters and even escape sequences. You can also define a custom escape character to perform the necessary conversion of Unicode values into a string character set.

Day 23 looks at a new way of concatenating and compound assigning:

Two new features are available in SQL Server 2025 for string operations; both for string concatenation.

The || and ||= combo are basically + and += for string, but it brings T-SQL in alignment with ANSI SQL. I’d still recommend using functions like CONCAT() for NULL-safety, or CONCAT_WS() for NULL-safety plus automatic separator addition, but it does fix a longer-standing pain point around platform compatibility.

Leave a Comment

Privilege Escalation via Replication Job

Fabiano Amorim makes note of a security concern:

Privilege escalation in SQL Server isn’t just theory – it can happen through everyday maintenance jobs. This article demonstrates how a user with roles like db_owner or db_ddladmin can exploit replication cleanup processes to gain sysadmin rights, and why monitoring trigger creation and job behavior is critical for security.

Replication is one of those things people tend not to understand very well, including the necessary permissions. It’s a lot easier simply to say, “Here’s sysadmin” because that actually works instead of giving you a cryptic error you can barely troubleshoot and that’s only thanks to a Repltalk article from 2009. And heaven help you if you’re looking at merge replication.

But as far as the article goes, I won’t say that it’s much ado about nothing. What I will, however, say is that your account needs to be db_owner or db_ddladmin first, and that does mitigate a fair amount of the risk.

Leave a Comment

The Impact of Sorting and Filters on Pagination

Aaron Bertrand continues digging into SQL Server pagination performance:

In my previous tip, Pagination Performance in SQL Server, I showed how to make SQL pagination more predictable – turning O(n) into O(1). I materialized and cached row numbers to page through instead of calculating them on every request. It wasn’t the whole story, though; real pagination queries rarely get to sort without filtering. Users always want more control, and filtering can threaten that predictability.

Read on for examples of how to handle a few different scenarios.

Leave a Comment

Budgeting in Azure

John Morehouse breaks out the envelopes:

When organizations migrate workloads to Azure, the focus is usually on architecture, performance, and security. Cost management should be part of that conversation—but in practice, it’s often treated as an afterthought. One of the most overlooked and underutilized tools in Azure is Budgets, despite the fact that it can prevent unpleasant billing surprises with minimal effort.

Azure budgeting is useful but not great. I think it relies too much on messaging without enough teeth, so that requires setting up runbooks and humans constantly reacting rather than being able to set stronger rules around scaling down resources prior to getting that unexpected and unwelcome surprise bill.

Leave a Comment