Archive for the 'Hadoop' Category

Apache Mahout talk at Triangle Java User’s Group

For those who live in the Triangle, I’ll be giving an intro talk on Mahout next Monday.  See Welcome to the Triangle Java Users Group for more details.  Due note the location is no longer in RTP, but at the Red Hat campus at NCSU.
Hope to see you there!

SF Bay Area Lucene/Solr Meetup

Just wanted to follow up on last night’s Lucene/Solr Meetup in San Francisco.
First off, special thanks to all the speakers (Jason Rutherglen, Michael Busch, Erik Hatcher and all the lightning talks.)  We had a lot of excellent talks ranging from low level Lucene details on payloads and real time search to high level discussions on [...]

Hadoop, Analytical Software, Finds Uses Beyond Search – NYTimes.com

Hadoop, Analytical Software, Finds Uses Beyond Search – NYTimes.com.
Nice writeup on Hadoop in the NYT today.  Of course, Hadoop is often used to power machine learning, too, which is the premise behind using it on Apache Mahout.

Surprise and Coincidence – musings from the long tail: Real-time decision making using map-reduce

Ted Dunning has a nice blurb on “scale free” development and Mahout/Hadoop/Map Reduce that is worth the quick read:
Surprise and Coincidence – musings from the long tail: Real-time decision making using map-reduce

ApacheCon Goodness this Week

Lots of goodness this week at ApacheCon, at least when it comes to Lucene, Solr, Mahout, Tika and Hadoop (i.e. the Lucene eco-system).  There’s 2 full days on Hadoop, with lots of coverage of all the pieces that go into Hadoop.  There’s also a full day of Lucene related talks, plus Erik and I are [...]

ZooKeeper/Tao – Hadoop Wiki

ZooKeeper/Tao – Hadoop Wiki
I like Zookeeper already, and I just started looking at it…  Hopefully the code lives up to the Tao.

BarCamp wiki / BarCampRDU

BarCamp wiki / BarCampRDU
I’ll be at BarCampRDU tomorrow.  I proposed two sessions, one on Hadoop and Mahout and one on Lucene and Solr.  I don’t think I really want to do both, but I would like to do at least one, so we’ll see what other people are interested in.
If you’re around and you want [...]

HP, Intel and Yahoo To Research Cloud Computing – Yahoo News

HP, Intel and Yahoo To Research Cloud Computing – Yahoo News
Boy, this could really come in handy in Open Source, especially projects like Mahout, Nutch and distributed Solr.  I find my biggest personal challenge on Mahout is access to computing resources.  I personally don’t have the financial backing to buy much time on Amazon EC2.  [...]

Apache Hadoop Wins Terabyte Sort Benchmark (Hadoop and Distributed Computing at Yahoo!)

Apache Hadoop Wins Terabyte Sort Benchmark (Hadoop and Distributed Computing at Yahoo!)
Congrats to the Hadoop team!  Score one for Open Source!

Mahout News

Wow!  Mahout has just got me pumped up.  I feel like we’ve got a lot of positive momentum and that we are starting to get the various pieces of our suite of machine learning libraries in place.  Various news items include:

Ted Dunning is now a committer!  Welcome Ted!
I put up a patch for a map-reduce [...]