Archive for the 'Java' Category

MySQL, Solr and “Communications link failure”

So, I was indexing a 10+ million records in MySQL into Solr and kept coming across the following odd MySQL exception: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure Last packet sent to the server was 4467745 ms ago … com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1074) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2985) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871) at In my code, I loop over a JDBC ResultSet and add the records [...]

Apache Hadoop Wins Terabyte Sort Benchmark (Hadoop and Distributed Computing at Yahoo!)

Apache Hadoop Wins Terabyte Sort Benchmark (Hadoop and Distributed Computing at Yahoo!) Congrats to the Hadoop team!  Score one for Open Source!

Open Source Search Engine Relevance

For a while now, I have been trying to get my hands on TREC data for the Lucene project.  For those who aren’t familiar, TREC is an annual competition for search engines that provides a common set of documents to index, queries to execute and judgments to check your answers to see how good an [...]

Mahout News

Wow!  Mahout has just got me pumped up.  I feel like we’ve got a lot of positive momentum and that we are starting to get the various pieces of our suite of machine learning libraries in place.  Various news items include: Ted Dunning is now a committer!  Welcome Ted! I put up a patch for [...]

BarCampRDU

BarCamp wiki / BarCampRDU Threw my name in the ring for BarCamp RDU today.  Haven’t been to BarCamp before, but Erik Hatcher suggested I go and check it out. Also put in a Proposed Session of “Apache Mahout and Hadoop – Having fun with Map Reduce and distributed computing”.  Figure we talk about the basics of [...]

Mahout Machine Learning Fun

It’s been an interesting few months over in Mahout land. First off, I am psyched about the response the project has been getting. Seems like there is a pent up demand for large scale machine learning these days.  I figured we would do all right in the early months, but I didn’t think we would [...]

Jeff Eastman’s Marvelous Cloud Computing Adventure

Jeff Eastman’s Marvelous Cloud Computing Adventure Mahout’s newest committer, Jeff Eastman, has a new blog on Mahout and Hadoop…

SummerOfCode2008 – Looking for a summer project in Machine Learning?

SummerOfCode2008 – General Wiki Check out the Apache Summer of Code page (link above) to see how you can spend the summer developing large scale machine learning algorithms and help out the Mahout project.  We’d love to have a few students put together a some projects implementing one or more machine learning algorithms using Hadoop.  [...]

Mahout: k-means Clustering

I committed a first crack at k-means clustering to Mahout last night, thanks again to Jeff Eastman’s excellent work.  This means Mahout now has two clustering algorithms designed to run using Hadoop‘s map reduce algorithm, meaning it should be able to scale up to very large data sets. To learn more about k-means, see the [...]

FeatherCast » Blog Archive » Episode 43: Lucene

FeatherCast » Blog Archive » Episode 43: Lucene I did a FeatherCast today with Rich Bowen.  Dang, he is quick at editing…