Joe Sack digs into a common problem in vector search. First up is a description of the problem:
I embedded two queries: “home with pool” and “home without pool.” The cosine similarity was 0.82. The embedding model treats negated queries as nearly identical to their positive counterparts.
For comparison, completely unrelated queries (“home with pool” vs “quarterly earnings report”) scored 0.13.
In my last post, I showed that vector search treats “home with pool” and “home without pool” as nearly identical (0.82 similarity). Bi-encoders struggle with negation.
Read on to learn how cross-encoders can help, but they come at a significant cost. Joe also describes a pattern that can minimize the total pain level when using cross-encoders.
Leave a Comment