Archive for November, 2008
Congratulations to Apache Tika (nevermind the incubator address, it’s still in the process of migrating) for graduating from Incubation! And welcome to the Lucene project! Tika is a content extraction framework that wraps many other content extraction libraries such as PDFBox, POI, and others into a single, easy to use framework that makes it easy [...]
November 13th, 2008 | Posted in Apache, Java, Lucene, Mahout, Manning, OpenNLP, Search, Solr, Taming Text, Tika, clustering, machine learning | 1 Comment
My intro to Mahout slides are available here.
November 8th, 2008 | Posted in Apache, ApacheCon, Lucene, Mahout, machine learning | 1 Comment
I often explain search and relevance at talks and training classes for Lucene and Solr. In doing so, I often discuss the concepts of search term weighting and their typical instantiations via term frequency and inverse document frequency (abbreviated as TF-IDF) in light of either the vector space model or in terms of determining relevance.
The [...]
November 8th, 2008 | Posted in Lucene, Search, Solr, relevance | 2 Comments
What’s new with Apache Solr.
My latest article on Apache Solr, title “What’s New with Apache Solr” is now available over at IBM developerWorks. It covers some of the new features like spell checking, Data Import Handler, distributed search, editorial results placement (a.k.a. “paid placement”), SolrJ and a variety of other pieces.
Hope it is helpful… Feel [...]
November 5th, 2008 | Posted in Indexing, Java, Lucene, Search, Solr, spell checking | 1 Comment
Lots of goodness this week at ApacheCon, at least when it comes to Lucene, Solr, Mahout, Tika and Hadoop (i.e. the Lucene eco-system). There’s 2 full days on Hadoop, with lots of coverage of all the pieces that go into Hadoop. There’s also a full day of Lucene related talks, plus Erik and I are [...]
November 1st, 2008 | Posted in ApacheCon, Hadoop, Lucene, Mahout, Solr | No Comments