Stop Using word2vec

Chris Moody wants you to stop using word2vec:

When I started playing with word2vec four years ago I needed (and luckily had) tons of supercomputer time. But because of advances in our understanding of word2vec, computing word vectors now takes fifteen minutes on a single run-of-the-mill computer with standard numerical libraries. Word vectors are awesome but you don’t need a neural network – and definitely don’t need deep learning – to find them. So if you’re using word vectors and aren’t gunning for state of the art or a paper publication then stop using word2vec.

Chris has a follow-up post on word tensors as well:

There’s only three steps to computing word tensors. Counting word-word-document skipgrams, normalizing those counts to form the PMI-like M tensor and then factorizing M into smaller matrices.

But to actually perform the factorization we’ll need to generalize the SVD to higher rank tensors 1. Unfortunately, tensor algebra libraries aren’t very common 2. We’ve written one for non-negative sparse tensor factorization, but because the PMI can be both positive and negative it isn’t applicable here. Instead, for this application I’d recommend HOSVD as implemented in scikit-tensor. I’ve also heard good things about tensorly.

I’m going to keep using word2vec for now, but it’s a good pair of posts.

Related Posts

Bias Correction In Standard Deviation Estimates

John Mount explains how to perform bias correction and explains why it happens so rarely in practice: The bias in question is falling off at a rate of 1/n (where n is our sample size). So the bias issue loses what little gravity it ever may have ever had when working with big data. Most sources of noise will […]

Read More

Explaining Neural Networks With H2O

Shirin Glander explains some of the concepts behind neural networks using H2O as a guide: Before, when describing the simple perceptron, I said that a result is calculated in a neuron, e.g. by summing up all the incoming data multiplied by weights. However, this has one big disadvantage: such an approach would only enable our neural net […]

Read More


October 2017
« Sep Nov »