Elasticsearch 5.0

Itamar Syn-hershko looks at the new functionality in the latest version of Elasticsearch:

One fundamental feature of Elasticsearch is scoring – or results ranking by relevance. The part that handles it is a Lucene component called Similarity. ES 5.0 now makes Okapi BM25 the default similarity and that’s quite an important change. The default has long been tf/idf, which is both simpler to understand but easier to be fooled by rogue results. BM25 is a probabalistic approach to ranking that almost always gives better results than the more vanilla tf/idf. I’ve been recommending customers to use BM25 over tf/idf for a long time now, and we also rely on it at Forter for doing quite a lot of interesting stuff. Overall, a good move by ES and I can finally archive a year’s long advise. Britta Weber has a great talk on explaining the difference, and BM25 in particular, definitely a recommended watch.

This is one of several search-related features in the latest version.  Looks like a solid release.

Related Posts

Replicating Solr Indexes

Nirmal Prabhu walks us through configuring replicated Solr instances: Step 4: [Creating master Core] First, we need to create a core for indexing the data. The Solr create command has the following options: -c <name> — Name of the core or collection to create (required). -d <confdir> — The configuration directory, useful in the SolrCloud mode. -n <configName> — The configuration […]

Read More

Connecting To Elasticsearch With R

Jerod Johnson has a sample of connecting to Elasticsearch with R: You will need the following information to connect to Elasticsearch as a JDBC data source: Driver Class: Set this to cdata.jdbc.elasticsearch.ElasticsearchDriver. Classpath: Set this to the location of the driver JAR. By default, this is the lib subfolder of the installation folder. The DBI functions, […]

Read More


November 2016
« Oct Dec »