Archive for the 'kMeans clustering' Category

Speeding up K-means Clustering with Algebra and Sparse Vectors « LingPipe Blog

k-means and other EM-like algorithms are trivial to parallelize because all the heavy computations in the inner loops are independent.
via Speeding up K-means Clustering with Algebra and Sparse Vectors « LingPipe Blog.
This is exactly what Apache Mahout does.  We have parallelized versions of a bunch of clustering algorithms, including k-means

Mahout: k-means Clustering

I committed a first crack at k-means clustering to Mahout last night, thanks again to Jeff Eastman’s excellent work.  This means Mahout now has two clustering algorithms designed to run using Hadoop’s map reduce algorithm, meaning it should be able to scale up to very large data sets.
To learn more about k-means, see the Mahout [...]