Archive for the 'Hadoop' Category
SummerOfCode2008 – General Wiki Check out the Apache Summer of Code page (link above) to see how you can spend the summer developing large scale machine learning algorithms and help out the Mahout project. We’d love to have a few students put together a some projects implementing one or more machine learning algorithms using Hadoop. [...]
March 12th, 2008 | Posted in Apache, Hadoop, Java, machine learning, Mahout, Map Reduce | No Comments
I committed a first crack at k-means clustering to Mahout last night, thanks again to Jeff Eastman’s excellent work. This means Mahout now has two clustering algorithms designed to run using Hadoop‘s map reduce algorithm, meaning it should be able to scale up to very large data sets. To learn more about k-means, see the [...]
March 1st, 2008 | Posted in Apache, clustering, Hadoop, Java, kMeans clustering, machine learning, Mahout, Map Reduce | 1 Comment
FeatherCast » Blog Archive » Episode 43: Lucene I did a FeatherCast today with Rich Bowen. Dang, he is quick at editing…
February 21st, 2008 | Posted in Apache, ApacheCon, feathercast, Hadoop, Java, Lucene, machine learning, Mahout, Nutch, Performance, Search, Tika | No Comments
Yahoo Search Wants to Be More Like Google, Embraces Hadoop Hadoop is an open-source implementation of Google’s MapReduce software and file system. It takes all the links on the Web found by a search engine’s crawlers and “reduces” them to a map of the Web so that ranking algorithms can be run against them. Ahem, [...]
February 20th, 2008 | Posted in Apache, Hadoop, Java, machine learning, Mahout, Map Reduce, Performance | No Comments
I have committed Mahout’s first Hadoop based machine learning code: https://issues.apache.org/jira/browse/MAHOUT-3 The code is an initial implementation of Canopy clustering. It is a start and it is great to see others jump right in and start adding code! Great work, Jeff Eastman, who contributed the initial implementation! Now, we can start building more goodness in [...]
February 19th, 2008 | Posted in Apache, canopy clustering, clustering, Hadoop, Java, machine learning, Mahout, Map Reduce | 2 Comments
Yahoo! Launches World’s Largest Hadoop Production Application (Hadoop and Distributed Computing at Yahoo!) Hadoop at large scale! Wish I had access to some of those machines!
February 19th, 2008 | Posted in Apache, Hadoop, Java, Map Reduce | No Comments
How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data | High Scalability Nice article on how the Lucene/Hadoop/Solr stack was used to solve a really big problem. Someday, I hope (when we have actual code), they can add Mahout to the equation and do even more interesting things with the data.
February 1st, 2008 | Posted in Apache, database, Hadoop, Indexing, Java, Lucene, Mahout, Search, Solr | No Comments
The Two Flavors of Google Nice article on Hadoop at Business Week and some good quotes from Lucene Java creator Doug Cutting. Hadoop holds a lot of promise for the future of large scale computing, although I don’t want to burden it with that kind of claim, either. In the end, it makes solving a [...]
December 26th, 2007 | Posted in Hadoop, Java, Lucene, machine learning, Mahout, Performance | No Comments