Archive for the 'relevance' Category
Just wanted to follow up on last night’s Lucene/Solr Meetup in San Francisco.
First off, special thanks to all the speakers (Jason Rutherglen, Michael Busch, Erik Hatcher and all the lightning talks.) We had a lot of excellent talks ranging from low level Lucene details on payloads and real time search to high level discussions on [...]
June 4th, 2009 | Posted in Droids, Hadoop, Java, Latent Dirichlet Allocation, Lucene, Lucid Imagination, Mahout, Open Relevance, Real Time Search, Solr, Tika, canopy clustering, machine learning, relevance | No Comments
Copying TREC is the Wrong Track for the Enterprise | The Noisy Channel.
Daniel Tunkelang has written up an interesting post on the new Open Relevance Project that me and a few other Lucene people are starting up and I thought I would respond here:
Little late to the conversation, but I think maybe we should back [...]
May 18th, 2009 | Posted in Apache, Lucene, Mahout, Open Relevance, Performance, Solr, machine learning, relevance | 2 Comments
I often explain search and relevance at talks and training classes for Lucene and Solr. In doing so, I often discuss the concepts of search term weighting and their typical instantiations via term frequency and inverse document frequency (abbreviated as TF-IDF) in light of either the vector space model or in terms of determining relevance.
The [...]
November 8th, 2008 | Posted in Lucene, Search, Solr, relevance | 3 Comments
Jeff’s Search Engine Caffè
Copyright and distribution issues
Let’s say for a minute that a web search track is interesting. A major barrier to improvements in academic and open source web search is the lack of large-scale (hundreds of millions or even billions of pages) test collections that evolve over time. GOV2 is a static crawl of [...]
May 22nd, 2008 | Posted in Lucene, Performance, Search, TREC, queries, relevance | No Comments
For a while now, I have been trying to get my hands on TREC data for the Lucene project. For those who aren’t familiar, TREC is an annual competition for search engines that provides a common set of documents to index, queries to execute and judgments to check your answers to see how good an [...]
May 18th, 2008 | Posted in Apache, Java, Lucene, Nutch, Performance, Search, Solr, TREC, relevance | 9 Comments