Using NLP To Find Similar Facebook Posts

The folks at Knoyd put together a word embedding example by scraping a Python Facebook group:

We are going to represent the content of a Facebook post using word embeddings and comparing the transformed posts using word mover’s distance. The combination of both have shown lower k-nearest neighbor-document classification error rates compared to other state of the art techniques.

The advantage of word embeddings is that the words which have similar meanings but don’t have any letters in common will still have similar vectors (be close) in the embedded space (e.g. lion and tiger).

There’s a good high-level discussion of techniques in this post.

Related Posts

Where Machine Learning And Econometrics Collide

Dave Giles shares some thoughts on how machine learning and econometrics relate: What is Machine Learning (ML), and how does it differ from Statistics (and hence, implicitly, from Econometrics)? Those are big questions, but I think that they’re ones that econometricians should be thinking about. And if I were starting out in Econometrics today, I’d […]

Read More

Auto ML With SQL Server 2019 Big Data Clusters

Marco Inchiosa has a model scenario for using Big Data Clusters to scale out a machine learning problem: H2O provides popular open source software for data science and machine learning on big data, including Apache SparkTM integration. It provides two open source python AutoML classes: h2o.automl.H2OAutoML and Both APIs use the same underlying algorithm implementations, […]

Read More


August 2017
« Jul Sep »