Luba Belokon asked Vadim Smolyakov to explain Bayesian Nonparametric models and here’s the result:

Bayesian Nonparametrics are a class of models for which the number of parameters grows with data. A simple example is non-parametric K-means clustering [1]. Instead of fixing the number of clusters K, we let data determine the best number of clusters. By letting the number of model parameters (cluster means and covariances) grow with data, we are better able to describe the data as well as generate new data given our model.

Of course, to avoid over-fitting, we penalize the number of clusters K via a regularization parameter which controls the rate at which new clusters are created.

This is an interesting discussion of the Dirichlet process, particularly as applied to K-mean clustering. It helps you figure out your best choice for K, no small task.

Kevin Feasel

2017-10-13

Data Science