Archive for the 'Search' Category

Lucid Imagination » Getting Started with Payloads

I just posted a brief intro on getting started with Apache Lucene payloads on Lucid’s blog for those who are interested.  Here’s the teaser: Like Spans, payloads involve the position of terms, but go one step further. Namely, a Payload in Apache Lucene is an arbitrary byte array stored at a specific position (i.e. a [...]

Congrats to Tika and Welcome to the Lucene Stack!

Congratulations to Apache Tika (nevermind the incubator address, it’s still in the process of migrating) for graduating from Incubation!   And welcome to the Lucene project!  Tika is a content extraction framework that wraps many other content extraction libraries such as PDFBox, POI, and others into a single, easy to use framework that makes it easy [...]

Tao and the Art of Search: Yin Yang and TF-IDF

I often explain search and relevance at talks and training classes for Lucene and Solr.  In doing so, I often discuss the concepts of search term weighting and their typical instantiations via term frequency and inverse document frequency (abbreviated as TF-IDF) in light of either the vector space model or in terms of determining relevance. [...]

“What’s new with Apache Solr” now available at IBM developerWorks

What’s new with Apache Solr. My latest article on Apache Solr, title “What’s New with Apache Solr” is now available over at IBM developerWorks.  It covers some of the new features like spell checking, Data Import Handler, distributed search, editorial results placement (a.k.a. “paid placement”), SolrJ and a variety of other pieces. Hope it is [...]

Lucene Boot Camp at ApacheCon US 2008

Just a quick reminder that there is just over one week left before Lucene Boot Camp at this year’s ApacheCon. This year, it is a 2 day training, but for those who want to, they can sign up for the first day of Lucene Boot Camp, and then attend Solr Boot Camp on the second [...]

Some New Features in Solr

I’ve had a chance recently to work on some things in Solr that I think that can, in the right circumstances, really enhance Solr. First off, is SOLR-651, which implements what I am calling a Term Vector Component. The basic gist of it is that Solr can now serve up term vectors from Lucene.  For [...]

Lucene Boot Camp at ApacheCon US

Lucene Boot Camp (ApacheCon site) Lucene Boot Camp (http://www.lucenebootcamp.com) is scheduled this year for ApacheCon US on November 3 and 4th in New Orleans.  This year, I am doing a two day event, as I felt the one day event was just not enough time to get in all the goodness that is Lucene (not [...]

wpSearch – Lucene search for WordPress

Code Fury The author of this nice plugin for WordPress contacted me today about his Lucene based WordPress plugin, so I thought I would give it a try, as I’m obviously a big fan of Lucene and also never much cared for MySql’s search (in)capabilities. The plugin is easy enough to install, only thing that [...]

Realtime Search for Lucene

[#LUCENE-1313] Ocean Realtime Search – ASF JIRA Jason Rutherglen has been up to some interesting things with Lucene lately concerning real time search.  This has always been one of those parts of Lucene that has been needed over time by some people, but has never reached the critical mass whereby someone tackles it.  Looks like [...]

Open Source Search Relevance Follow Up

Jeff’s Search Engine Caffè Copyright and distribution issues Let’s say for a minute that a web search track is interesting. A major barrier to improvements in academic and open source web search is the lack of large-scale (hundreds of millions or even billions of pages) test collections that evolve over time. GOV2 is a static [...]