Press "Enter" to skip to content

Vector Search: Negation and Cross-Encoding

Joe Sack digs into a common problem in vector search. First up is a description of the problem:

I embedded two queries: “home with pool” and “home without pool.” The cosine similarity was 0.82. The embedding model treats negated queries as nearly identical to their positive counterparts.

For comparison, completely unrelated queries (“home with pool” vs “quarterly earnings report”) scored 0.13.

And one imperfect solution:

In my last post, I showed that vector search treats “home with pool” and “home without pool” as nearly identical (0.82 similarity). Bi-encoders struggle with negation.

Cross-encoders can help with this. But there’s a trade-off.

Read on to learn how cross-encoders can help, but they come at a significant cost. Joe also describes a pattern that can minimize the total pain level when using cross-encoders.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.