Archive for May, 2008

Open Source Search Relevance Follow Up

Jeff’s Search Engine Caffè
Copyright and distribution issues
Let’s say for a minute that a web search track is interesting. A major barrier to improvements in academic and open source web search is the lack of large-scale (hundreds of millions or even billions of pages) test collections that evolve over time. GOV2 is a static crawl of [...]

Open Source Search Engine Relevance

For a while now, I have been trying to get my hands on TREC data for the Lucene project.  For those who aren’t familiar, TREC is an annual competition for search engines that provides a common set of documents to index, queries to execute and judgments to check your answers to see how good an [...]

Taste is now committed

I haven’t tried it yet (pesky day job :-)  ) but I see that Taste is now committed to Mahout.  In fact, I think Sean has already started on some parallelization efforts!  Very cool.

Mahout News

Wow!  Mahout has just got me pumped up.  I feel like we’ve got a lot of positive momentum and that we are starting to get the various pieces of our suite of machine learning libraries in place.  Various news items include:

Ted Dunning is now a committer!  Welcome Ted!
I put up a patch for a map-reduce [...]

What I’ve been up to lately: Lucid Imagination

Lucid Imagination
Well, the cat is out of the bag.  In case you haven’t heard, a few Lucene/Solr/Mahout committers (Erik Hatcher and Yonik Seeley) and I have teamed up with some other long time search veterans (Marc Krellenstein from Northern Light and former CTO of Reed Elsevier, amongst others) to build a company around providing product, [...]