Itamar Syn-hershko looks at the new functionality in the latest version of Elasticsearch:
One fundamental feature of Elasticsearch is scoring – or results ranking by relevance. The part that handles it is a Lucene component called Similarity. ES 5.0 now makes Okapi BM25 the default similarity and that’s quite an important change. The default has long been tf/idf, which is both simpler to understand but easier to be fooled by rogue results. BM25 is a probabalistic approach to ranking that almost always gives better results than the more vanilla tf/idf. I’ve been recommending customers to use BM25 over tf/idf for a long time now, and we also rely on it at Forter for doing quite a lot of interesting stuff. Overall, a good move by ES and I can finally archive a year’s long advise. Britta Weber has a great talk on explaining the difference, and BM25 in particular, definitely a recommended watch.
This is one of several search-related features in the latest version. Looks like a solid release.
Comments closed