Archive for May, 2008
Jeff’s Search Engine Caffè
Copyright and distribution issues
Let’s say for a minute that a web search track is interesting. A major barrier to improvements in academic and open source web search is the lack of large-scale (hundreds of millions or even billions of pages) test collections that evolve over time. GOV2 is a static crawl of [...]
May 22nd, 2008 | Posted in Lucene, Performance, Search, TREC, queries, relevance | No Comments
For a while now, I have been trying to get my hands on TREC data for the Lucene project. For those who aren’t familiar, TREC is an annual competition for search engines that provides a common set of documents to index, queries to execute and judgments to check your answers to see how good an [...]
May 18th, 2008 | Posted in Apache, Java, Lucene, Nutch, Performance, Search, Solr, TREC, relevance | 8 Comments
I haven’t tried it yet (pesky day job :-) ) but I see that Taste is now committed to Mahout. In fact, I think Sean has already started on some parallelization efforts! Very cool.
May 15th, 2008 | Posted in Mahout, Map Reduce | No Comments
Wow! Mahout has just got me pumped up. I feel like we’ve got a lot of positive momentum and that we are starting to get the various pieces of our suite of machine learning libraries in place. Various news items include:
Ted Dunning is now a committer! Welcome Ted!
I put up a patch for a map-reduce [...]
May 6th, 2008 | Posted in Hadoop, Java, Mahout, Map Reduce | No Comments
Lucid Imagination
Well, the cat is out of the bag. In case you haven’t heard, a few Lucene/Solr/Mahout committers (Erik Hatcher and Yonik Seeley) and I have teamed up with some other long time search veterans (Marc Krellenstein from Northern Light and former CTO of Reed Elsevier, amongst others) to build a company around providing product, [...]
May 2nd, 2008 | Posted in Lucene, Lucid Imagination, Mahout, Solr | No Comments