Archive for February, 2008

FeatherCast » Blog Archive » Episode 43: Lucene

FeatherCast » Blog Archive » Episode 43: Lucene
I did a FeatherCast today with Rich Bowen.  Dang, he is quick at editing…

Yahoo Search Wants to Be More Like Google, Embraces Hadoop

Yahoo Search Wants to Be More Like Google, Embraces Hadoop
Hadoop is an open-source implementation of Google’s MapReduce software and file system. It takes all the links on the Web found by a search engine’s crawlers and “reduces” them to a map of the Web so that ranking algorithms can be run against them.
Ahem, Hadoop [...]

Mahout’s First Commit

I have committed Mahout’s first Hadoop based machine learning code: https://issues.apache.org/jira/browse/MAHOUT-3
The code is an initial implementation of Canopy clustering. It is a start and it is great to see others jump right in and start adding code!  Great work, Jeff Eastman, who contributed the initial implementation!
Now, we can start building more goodness in order to [...]

Yahoo! Launches World’s Largest Hadoop Production Application (Hadoop and Distributed Computing at Yahoo!)

Yahoo! Launches World’s Largest Hadoop Production Application (Hadoop and Distributed Computing at Yahoo!)
Hadoop at large scale!  Wish I had access to some of those machines! 

How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data | High Scalability

How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data | High Scalability
Nice article on how the Lucene/Hadoop/Solr stack was used to solve a really big problem.  Someday, I hope (when we have actual code),  they can add Mahout to the equation and do even more interesting things with the data.