Archive for November, 2008

Congrats to Tika and Welcome to the Lucene Stack!

Congratulations to Apache Tika (nevermind the incubator address, it’s still in the process of migrating) for graduating from Incubation!   And welcome to the Lucene project!  Tika is a content extraction framework that wraps many other content extraction libraries such as PDFBox, POI, and others into a single, easy to use framework that makes it easy [...]

Intro to Mahout slides available

My intro to Mahout slides are available here.

Tao and the Art of Search: Yin Yang and TF-IDF

I often explain search and relevance at talks and training classes for Lucene and Solr.  In doing so, I often discuss the concepts of search term weighting and their typical instantiations via term frequency and inverse document frequency (abbreviated as TF-IDF) in light of either the vector space model or in terms of determining relevance.
The [...]

“What’s new with Apache Solr” now available at IBM developerWorks

What’s new with Apache Solr.
My latest article on Apache Solr, title “What’s New with Apache Solr” is now available over at IBM developerWorks.  It covers some of the new features like spell checking, Data Import Handler, distributed search, editorial results placement (a.k.a. “paid placement”), SolrJ and a variety of other pieces.
Hope it is helpful…  Feel [...]

ApacheCon Goodness this Week

Lots of goodness this week at ApacheCon, at least when it comes to Lucene, Solr, Mahout, Tika and Hadoop (i.e. the Lucene eco-system).  There’s 2 full days on Hadoop, with lots of coverage of all the pieces that go into Hadoop.  There’s also a full day of Lucene related talks, plus Erik and I are [...]