Yahoo Search Wants to Be More Like Google, Embraces Hadoop

Yahoo Search Wants to Be More Like Google, Embraces Hadoop

Hadoop is an open-source implementation of Google’s MapReduce software and file system. It takes all the links on the Web found by a search engine’s crawlers and “reduces” them to a map of the Web so that ranking algorithms can be run against them.

Ahem, Hadoop knows nothing about links on the web.  Yahoo’s use of Hadoop takes all the links on the web and does some fancy math on them.

Hadoop is a library for doing distributed computing.  That’s why we are using it for Mahout (which doesn’t care about links)  Just trying to clarify such that people don’t pigeonhole Hadoop based on Y!’s usage of it in one particular application.

Leave a Reply

*
To prove that you're not a bot, enter this code
Anti-Spam Image