Speeding up K-means Clustering with Algebra and Sparse Vectors « LingPipe Blog
k-means and other EM-like algorithms are trivial to parallelize because all the heavy computations in the inner loops are independent.
via Speeding up K-means Clustering with Algebra and Sparse Vectors « LingPipe Blog.
This is exactly what Apache Mahout does. We have parallelized versions of a bunch of clustering algorithms, including k-means






Is there a release of Mahout? I don’t see any javadoc or actual release linked from the home page:
http://lucene.apache.org/mahout/
It looks like I could just check out a version from the subversion archive.
PS: Thanks for the link; whenever a more popular blog links to ours we see a huge uptick in traffic.
We are in the process of voting on the 0.1 release as I type. From that, we will publish javadocs, etc.