K-Means Clustering With Python

Kevin Feasel

2016-07-04

Python

David Crook discusses k-means clustering and how to implement it using Python:

K-Means takes in an unlabeled data set and a whole real number, k.  K is the number of centroids, or clusters you wish to find.  If you do not know how many clusters there should be, it is possible to do some pre-processing to find that more automatically, however that is out of the scope of this article.  Once you have a data set and defined the size of k, K-Means begins its iterative process.  It starts by selecting centroids by moving them to the average of the data associated with them.  It then reshuffles all of the data into new groups based on the proximity to each centroid.

This is a big and detailed post, and worth reading in its totality.

Related Posts

Sales Predictions with Pandas

Megan Quinn shows how you can use Pandas and linear regression to predict sales figures: Pandas is an open-source Python package that provides users with high-performing and flexible data structures. These structures are designed to make analyzing relational or labeled data both easy and intuitive. Pandas is one of the most popular and quintessential tools leveraged […]

Read More

Monte Carlo Simulation in Python

Kristian Larsen has a couple of posts on Monte Carlo style simulation in Python. First up is a post which covers how to generate data from different distributions: One method that is very useful for data scientist/data analysts in order to validate methods or data is Monte Carlo simulation. In this article, you learn how […]

Read More

Categories

July 2016
MTWTFSS
« Jun Aug »
 123
45678910
11121314151617
18192021222324
25262728293031