Archive for the 'Map Reduce' Category

SummerOfCode2008 – Looking for a summer project in Machine Learning?

SummerOfCode2008 – General Wiki Check out the Apache Summer of Code page (link above) to see how you can spend the summer developing large scale machine learning algorithms and help out the Mahout project.  We’d love to have a few students put together a some projects implementing one or more machine learning algorithms using Hadoop.  [...]

Mahout: k-means Clustering

I committed a first crack at k-means clustering to Mahout last night, thanks again to Jeff Eastman’s excellent work.  This means Mahout now has two clustering algorithms designed to run using Hadoop‘s map reduce algorithm, meaning it should be able to scale up to very large data sets. To learn more about k-means, see the [...]

Yahoo Search Wants to Be More Like Google, Embraces Hadoop

Yahoo Search Wants to Be More Like Google, Embraces Hadoop Hadoop is an open-source implementation of Google’s MapReduce software and file system. It takes all the links on the Web found by a search engine’s crawlers and “reduces” them to a map of the Web so that ranking algorithms can be run against them. Ahem, [...]

Mahout’s First Commit

I have committed Mahout’s first Hadoop based machine learning code: https://issues.apache.org/jira/browse/MAHOUT-3 The code is an initial implementation of Canopy clustering. It is a start and it is great to see others jump right in and start adding code!  Great work, Jeff Eastman, who contributed the initial implementation! Now, we can start building more goodness in [...]

Yahoo! Launches World’s Largest Hadoop Production Application (Hadoop and Distributed Computing at Yahoo!)

Yahoo! Launches World’s Largest Hadoop Production Application (Hadoop and Distributed Computing at Yahoo!) Hadoop at large scale!  Wish I had access to some of those machines! 

Good Math, Bad Math : Databases are hammers; MapReduce is a screwdriver.

Good Math, Bad Math : Databases are hammers; MapReduce is a screwdriver. Well stated response to a criticism on Map Reduce.  Adding my own two cents, I once used Hadoop, a free open source implementation of Map Reduce (M/R) in a proof of concept implementation, to automatically translate (as in machine translation) a large (in [...]