Archive for the 'relevance' Category

SF Bay Area Lucene/Solr Meetup

Just wanted to follow up on last night’s Lucene/Solr Meetup in San Francisco.
First off, special thanks to all the speakers (Jason Rutherglen, Michael Busch, Erik Hatcher and all the lightning talks.)  We had a lot of excellent talks ranging from low level Lucene details on payloads and real time search to high level discussions on [...]

Copying TREC is the Wrong Track for the Enterprise | The Noisy Channel

Copying TREC is the Wrong Track for the Enterprise | The Noisy Channel.
Daniel Tunkelang has written up an interesting post on the new Open Relevance Project that me and a few other Lucene people are starting up and I thought I would respond here:
Little late to the conversation, but I think maybe we should back [...]

Tao and the Art of Search: Yin Yang and TF-IDF

I often explain search and relevance at talks and training classes for Lucene and Solr.  In doing so, I often discuss the concepts of search term weighting and their typical instantiations via term frequency and inverse document frequency (abbreviated as TF-IDF) in light of either the vector space model or in terms of determining relevance.
The [...]

Open Source Search Relevance Follow Up

Jeff’s Search Engine Caffè
Copyright and distribution issues
Let’s say for a minute that a web search track is interesting. A major barrier to improvements in academic and open source web search is the lack of large-scale (hundreds of millions or even billions of pages) test collections that evolve over time. GOV2 is a static crawl of [...]

Open Source Search Engine Relevance

For a while now, I have been trying to get my hands on TREC data for the Lucene project.  For those who aren’t familiar, TREC is an annual competition for search engines that provides a common set of documents to index, queries to execute and judgments to check your answers to see how good an [...]