Open Source Search Engine Relevance

For a while now, I have been trying to get my hands on TREC data for the Lucene project.  For those who aren’t familiar, TREC is an annual competition for search engines that provides a common set of documents to index, queries to execute and judgments to check your answers to see how good an [...]

Apache Mahout - Overview

Apache Mahout - Overview
It’s official!  Mahout is now an official subproject of Lucene at the Apache Software Foundation.  Mahout’s goal is to create a suite of practical, scalable machine learning libraries.

Coderspiel / The right tool for the slob

Coderspiel / The right tool for the slob
This guy’s comment system wasn’t working at the moment, so I will leave my comment here. This won’t make much sense without reading the post first:
It’s funny you mention Wikipedia as an example, since they are running Lucene. As is Technorati and the Internet Archive. [...]

Lucene goodness

Lots of good things happening in Lucene land lately, all of which should benefit users with faster indexing and searching capabilities.  Most notably, Lucene 2.3 (hopefully released this quarter) has some major changes in indexing memory management and performance.  I have personally clocked indexing using release 2.2 at about 400 rec/s (single threaded, Mac Pro [...]