Archive for the 'Taming Text' Category
It’s been a while since I reported anything on Mahout (here’s why), but thought I would give an update. I know it’s been promised before, but the committers have been diligently working on a 0.1 release, which should be out very soon. I think I have all the Maven release stuff in place and am [...]
February 9th, 2009 | Posted in Apache, Java, Mahout, Solr, Taming Text, clustering | No Comments
Congratulations to Apache Tika (nevermind the incubator address, it’s still in the process of migrating) for graduating from Incubation! And welcome to the Lucene project! Tika is a content extraction framework that wraps many other content extraction libraries such as PDFBox, POI, and others into a single, easy to use framework that makes it easy [...]
November 13th, 2008 | Posted in Apache, Java, Lucene, Mahout, Manning, OpenNLP, Search, Solr, Taming Text, Tika, clustering, machine learning | 3 Comments
Charlotte JUG » October Slides Available – Search & Analysis
Had a lot of fun at my recent talk at the Charlotte JUG. They’ve got a good core of people and there was a lot of good discussion about the topic. Even managed to give away some free eBooks of “Taming Text“. Wish I would have [...]
October 24th, 2008 | Posted in Charlotte, Java, Lucene, Mahout, Manning, Taming Text, machine learning | No Comments
I’ve had a chance recently to work on some things in Solr that I think that can, in the right circumstances, really enhance Solr.
First off, is SOLR-651, which implements what I am calling a Term Vector Component. The basic gist of it is that Solr can now serve up term vectors from Lucene. For those [...]
October 23rd, 2008 | Posted in Apache, Java, Lucene, Mahout, Manning, Search, Solr, Taming Text, clustering, machine learning, spell checking, term vectors, tokenization | 1 Comment
Charlotte JUG » OCT 15TH – 6PM – Search and Text Analysis
I will be speaking at the Charlotte Java Users Group on Oct. 15th, covering things like Lucene, Solr, OpenNLP and Mahout, amongst other things. Basically, a high level talk on my book.
October 1st, 2008 | Posted in Charlotte, Lucene, Mahout, Manning, North Carolina, Solr, Taming Text | No Comments
Kudo’s to Dr. Ted Pedersen for finally saying out loud (in the latest issue of Computational Linguistics, thanks to Bob Carpenter for the pointer) what I’ve long thought about academic publications on topics like information retrieval and machine learning: namely, publications of empirical results in software systems without publishing the software is a disservice to [...]
September 18th, 2008 | Posted in Apache, Mahout, Taming Text, machine learning | 5 Comments
Manning: Taming Text
Scary… I guess it is real!
April 28th, 2008 | Posted in Hadoop, Lucene, Mahout, Manning, Solr, Taming Text, machine learning | 3 Comments