Archive for the 'machine learning' Category
k-means and other EM-like algorithms are trivial to parallelize because all the heavy computations in the inner loops are independent. via Speeding up K-means Clustering with Algebra and Sparse Vectors « LingPipe Blog. This is exactly what Apache Mahout does. We have parallelized versions of a bunch of clustering algorithms, including k-means
March 18th, 2009 | Posted in clustering, kMeans clustering, machine learning, Mahout | 2 Comments
Hadoop, Analytical Software, Finds Uses Beyond Search – NYTimes.com. Nice writeup on Hadoop in the NYT today. Of course, Hadoop is often used to power machine learning, too, which is the premise behind using it on Apache Mahout.
March 17th, 2009 | Posted in Hadoop, machine learning, Mahout | No Comments
Lucid Imagination » Add our Lucene Ecosystem Search Engine to Firefox Mark Miller shows how to add Lucid’s Lucene ecosystem search as a Firefox plugin. Now you can search all the Lucene project (and subproject) archives, website, wiki from the comfort of your browser plugin.
March 3rd, 2009 | Posted in Lucene Boot Camp, machine learning, Mahout, North Carolina, Tika, wpSearch | No Comments
SummerOfCode2009 – General Wiki It’s that time of year again. Time for students to sign up for Google Summer of Code. Gist of it: Get paid to work in Open Source for the summer. I’ve signed up to mentor for Apache Mahout. We are looking for students interested in implementing cutting-edge machine learning algorithms, optionally [...]
February 18th, 2009 | Posted in Apache, Google Summer of Code, Lucene, machine learning, Mahout, Solr, Tika | No Comments
Congratulations to Apache Tika (nevermind the incubator address, it’s still in the process of migrating) for graduating from Incubation! And welcome to the Lucene project! Tika is a content extraction framework that wraps many other content extraction libraries such as PDFBox, POI, and others into a single, easy to use framework that makes it easy [...]
November 13th, 2008 | Posted in Apache, clustering, Java, Lucene, machine learning, Mahout, Manning, OpenNLP, Search, Solr, Taming Text, Tika | 3 Comments
My intro to Mahout slides are available here.
November 8th, 2008 | Posted in Apache, ApacheCon, Lucene, machine learning, Mahout | 1 Comment
Charlotte JUG » October Slides Available – Search & Analysis Had a lot of fun at my recent talk at the Charlotte JUG. They’ve got a good core of people and there was a lot of good discussion about the topic. Even managed to give away some free eBooks of “Taming Text“. Wish I would [...]
October 24th, 2008 | Posted in Charlotte, Java, Lucene, machine learning, Mahout, Manning, Taming Text | No Comments
I’ve had a chance recently to work on some things in Solr that I think that can, in the right circumstances, really enhance Solr. First off, is SOLR-651, which implements what I am calling a Term Vector Component. The basic gist of it is that Solr can now serve up term vectors from Lucene. For [...]
October 23rd, 2008 | Posted in Apache, clustering, Java, Lucene, machine learning, Mahout, Manning, Search, Solr, spell checking, Taming Text, term vectors, tokenization | 1 Comment
Kudo’s to Dr. Ted Pedersen for finally saying out loud (in the latest issue of Computational Linguistics, thanks to Bob Carpenter for the pointer) what I’ve long thought about academic publications on topics like information retrieval and machine learning: namely, publications of empirical results in software systems without publishing the software is a disservice to [...]
September 18th, 2008 | Posted in Apache, machine learning, Mahout, Taming Text | 5 Comments
BarCamp wiki / BarCampRDU I’ll be at BarCampRDU tomorrow. I proposed two sessions, one on Hadoop and Mahout and one on Lucene and Solr. I don’t think I really want to do both, but I would like to do at least one, so we’ll see what other people are interested in. If you’re around and [...]
August 1st, 2008 | Posted in Apache, BarCampRDU, Hadoop, Java, Lucene, machine learning, Mahout, Map Reduce, Nutch, Raleigh, Triangle | 5 Comments