Archive for the 'Mahout' Category
Yahoo Search Wants to Be More Like Google, Embraces Hadoop
Hadoop is an open-source implementation of Google’s MapReduce software and file system. It takes all the links on the Web found by a search engine’s crawlers and “reduces” them to a map of the Web so that ranking algorithms can be run against them.
Ahem, Hadoop [...]
February 20th, 2008 | Posted in Apache, Hadoop, Java, Mahout, Map Reduce, Performance, machine learning | No Comments
I have committed Mahout’s first Hadoop based machine learning code: https://issues.apache.org/jira/browse/MAHOUT-3
The code is an initial implementation of Canopy clustering. It is a start and it is great to see others jump right in and start adding code! Great work, Jeff Eastman, who contributed the initial implementation!
Now, we can start building more goodness in order to [...]
February 19th, 2008 | Posted in Apache, Hadoop, Java, Mahout, Map Reduce, canopy clustering, clustering, machine learning | 2 Comments
How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data | High Scalability
Nice article on how the Lucene/Hadoop/Solr stack was used to solve a really big problem. Someday, I hope (when we have actual code), they can add Mahout to the equation and do even more interesting things with the data.
February 1st, 2008 | Posted in Apache, Hadoop, Indexing, Java, Lucene, Mahout, Search, Solr, database | No Comments
Good Math, Bad Math : Databases are hammers; MapReduce is a screwdriver.
Well stated response to a criticism on Map Reduce. Adding my own two cents, I once used Hadoop, a free open source implementation of Map Reduce (M/R) in a proof of concept implementation, to automatically translate (as in machine translation) a large (in my [...]
January 26th, 2008 | Posted in Apache, Mahout, Map Reduce, database, machine learning | No Comments
Apache Mahout - Overview
It’s official! Mahout is now an official subproject of Lucene at the Apache Software Foundation. Mahout’s goal is to create a suite of practical, scalable machine learning libraries.
January 25th, 2008 | Posted in Apache, Java, Mahout, machine learning | No Comments
The Two Flavors of Google
Nice article on Hadoop at Business Week and some good quotes from Lucene Java creator Doug Cutting. Hadoop holds a lot of promise for the future of large scale computing, although I don’t want to burden it with that kind of claim, either. In the end, it makes solving a good [...]
December 26th, 2007 | Posted in Hadoop, Java, Lucene, Mahout, Performance, machine learning | No Comments