Archive for the 'Hadoop' Category
Wow! Mahout has just got me pumped up. I feel like we’ve got a lot of positive momentum and that we are starting to get the various pieces of our suite of machine learning libraries in place. Various news items include:
Ted Dunning is now a committer! Welcome Ted!
I put up a patch for a map-reduce [...]
May 6th, 2008 | Posted in Hadoop, Java, Mahout, Map Reduce | No Comments
Manning: Taming Text
Scary… I guess it is real!
April 28th, 2008 | Posted in Hadoop, Lucene, Mahout, Manning, Solr, Taming Text, machine learning | 3 Comments
BarCamp wiki / BarCampRDU
Threw my name in the ring for BarCamp RDU today. Haven’t been to BarCamp before, but Erik Hatcher suggested I go and check it out.
Also put in a Proposed Session of “Apache Mahout and Hadoop - Having fun with Map Reduce and distributed computing”. Figure we talk about the basics of M/R, Hadoop [...]
April 23rd, 2008 | Posted in Apache, BarCampRDU, Hadoop, Java, Lucene, Mahout, Map Reduce, machine learning | No Comments
It’s been an interesting few months over in Mahout land. First off, I am psyched about the response the project has been getting. Seems like there is a pent up demand for large scale machine learning these days. I figured we would do all right in the early months, but I [...]
April 20th, 2008 | Posted in Apache, ApacheCon, Hadoop, Java, Lucene, Mahout, Map Reduce, machine learning | No Comments
Jeff Eastman’s Marvelous Cloud Computing Adventure
Mahout’s newest committer, Jeff Eastman, has a new blog on Mahout and Hadoop…
March 28th, 2008 | Posted in Apache, Hadoop, Java, Lucene, Mahout, Map Reduce, clustering, machine learning | No Comments
SummerOfCode2008 - General Wiki
Check out the Apache Summer of Code page (link above) to see how you can spend the summer developing large scale machine learning algorithms and help out the Mahout project. We’d love to have a few students put together a some projects implementing one or more machine learning algorithms using Hadoop. So, [...]
March 12th, 2008 | Posted in Apache, Hadoop, Java, Mahout, Map Reduce, machine learning | No Comments
I committed a first crack at k-means clustering to Mahout last night, thanks again to Jeff Eastman’s excellent work. This means Mahout now has two clustering algorithms designed to run using Hadoop’s map reduce algorithm, meaning it should be able to scale up to very large data sets.
To learn more about k-means, see the Mahout [...]
March 1st, 2008 | Posted in Apache, Hadoop, Java, Mahout, Map Reduce, clustering, kMeans clustering, machine learning | No Comments
FeatherCast » Blog Archive » Episode 43: Lucene
I did a FeatherCast today with Rich Bowen. Dang, he is quick at editing…
February 21st, 2008 | Posted in Apache, ApacheCon, Hadoop, Java, Lucene, Mahout, Nutch, Performance, Search, Tika, feathercast, machine learning | No Comments
Yahoo Search Wants to Be More Like Google, Embraces Hadoop
Hadoop is an open-source implementation of Google’s MapReduce software and file system. It takes all the links on the Web found by a search engine’s crawlers and “reduces” them to a map of the Web so that ranking algorithms can be run against them.
Ahem, Hadoop [...]
February 20th, 2008 | Posted in Apache, Hadoop, Java, Mahout, Map Reduce, Performance, machine learning | No Comments
I have committed Mahout’s first Hadoop based machine learning code: https://issues.apache.org/jira/browse/MAHOUT-3
The code is an initial implementation of Canopy clustering. It is a start and it is great to see others jump right in and start adding code! Great work, Jeff Eastman, who contributed the initial implementation!
Now, we can start building more goodness in order to [...]
February 19th, 2008 | Posted in Apache, Hadoop, Java, Mahout, Map Reduce, canopy clustering, clustering, machine learning | 2 Comments