Archive for the 'Indexing' Category

Why Lucene Isn’t That Good | Javalobby

Why Lucene Isn’t That Good | Javalobby
Patches welcome…  I know that is an old saw, but that is the only way it’s going to get better.
There are some good points in here, and some stuff that is a bit dramatic.
We do try to keep adapting Lucene and make it better, but in some respects we [...]

How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data | High Scalability

How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data | High Scalability
Nice article on how the Lucene/Hadoop/Solr stack was used to solve a really big problem.  Someday, I hope (when we have actual code),  they can add Mahout to the equation and do even more interesting things with the data.

Coderspiel / January 2008

Coderspiel / January 2008
I hardly think Lucene is creating an isolationist culture, nor do we think our project is perfect.  What we do agree on is that our time is better spent on figuring out how to make Lucene better, not how to spend our time doing UNIX administration in a virtual server environment.  As [...]

Coderspiel / The right tool for the slob

Coderspiel / The right tool for the slob
This guy’s comment system wasn’t working at the moment, so I will leave my comment here. This won’t make much sense without reading the post first:
It’s funny you mention Wikipedia as an example, since they are running Lucene. As is Technorati and the Internet Archive. [...]

Open Source Search Engine Comparison

Interesting comparison of open source search engines available at http://wrg.upf.edu/WRG/dctos/Middleton-Baeza.pdf.   While it reflects OK on Lucene (hey, we can’t be perfect at everything,) I am interested in finding out more details about what settings were used for indexing.  If they just used the out of the box settings, then I would argue that they need [...]

New Lucene Boot Camp site

I have setup a new site to support my Lucene Boot Camp training.  Check it out at http://lucenebootcamp.com.  From there, you can download training setup information, read the class outline, etc.

Lucene goodness

Lots of good things happening in Lucene land lately, all of which should benefit users with faster indexing and searching capabilities.  Most notably, Lucene 2.3 (hopefully released this quarter) has some major changes in indexing memory management and performance.  I have personally clocked indexing using release 2.2 at about 400 rec/s (single threaded, Mac Pro [...]

Reminder: Lucene Boot Camp at ApacheCon US

Just a friendly reminder, I am giving my Lucene Boot Camp training at ApacheCon Atlanta this year (November.)  Still plenty of time to sign up.  Details on the class are here.  Also, feel free to email me with any questions or things you would like to see.  My apache.org email is gsingers.  I will be [...]

Triangle Java Users Group talk on Lucene and Solr

Welcome to the Triangle Java Users Group
I will be speaking November 19, 2007 at the Triangle Java Users Group on Lucene and Solr.   The talk will be an introduction to the features and capabilities of both Lucene and Solr, as well as some basic compare and contrast information.

Lucene and Solr at ApacheCon

Looks like they have put up the ApacheCon Atlanta schedule. As usual, there looks to be several very good talks covering Lucene and Solr, including talks by Chris Hostetter, Ken Krugler, Michael Busch and yours truly. My talk is at 3pm on November 16, details are here.
I will also be leading my “Lucene [...]