Archive for the 'Performance' Category

Yahoo Search Wants to Be More Like Google, Embraces Hadoop

Yahoo Search Wants to Be More Like Google, Embraces Hadoop Hadoop is an open-source implementation of Google’s MapReduce software and file system. It takes all the links on the Web found by a search engine’s crawlers and “reduces” them to a map of the Web so that ranking algorithms can be run against them. Ahem, [...]

The Two Flavors of Google — Nice article on Hadoop

The Two Flavors of Google Nice article on Hadoop at Business Week and some good quotes from Lucene Java creator Doug Cutting.  Hadoop holds a lot of promise for the future of large scale computing, although I don’t want to burden it with that kind of claim, either.  In the end, it makes solving a [...]

ApacheCon EU 2008

ApacheCon EU 2008 Schedule is out for ApacheCon Europe.  I will be doing my Lucene Boot Camp training and a Lucene Performance talk.  Erik Hatcher will also be doing a Solr Boot Camp and a Lucene/Solr talk.  There will also be some Hadoop talks.

Open Source Search Engine Comparison

Interesting comparison of open source search engines available at http://wrg.upf.edu/WRG/dctos/Middleton-Baeza.pdf.   While it reflects OK on Lucene (hey, we can’t be perfect at everything,) I am interested in finding out more details about what settings were used for indexing.  If they just used the out of the box settings, then I would argue that they need [...]

Lucene goodness

Lots of good things happening in Lucene land lately, all of which should benefit users with faster indexing and searching capabilities.  Most notably, Lucene 2.3 (hopefully released this quarter) has some major changes in indexing memory management and performance.  I have personally clocked indexing using release 2.2 at about 400 rec/s (single threaded, Mac Pro [...]

Triangle Java Users Group talk on Lucene and Solr

Welcome to the Triangle Java Users Group I will be speaking November 19, 2007 at the Triangle Java Users Group on Lucene and Solr.   The talk will be an introduction to the features and capabilities of both Lucene and Solr, as well as some basic compare and contrast information.

Lucene and Solr at ApacheCon

Looks like they have put up the ApacheCon Atlanta schedule. As usual, there looks to be several very good talks covering Lucene and Solr, including talks by Chris Hostetter, Ken Krugler, Michael Busch and yours truly. My talk is at 3pm on November 16, details are here. I will also be leading my “Lucene Boot [...]

ImproveIndexingSpeed – Lucene-java Wiki

ImproveIndexingSpeed – Lucene-java Wiki People might find the indexing speed tips here useful

Lucene Indexing Performance: Managing RAM while Indexing follow up

Nice discussion on tuning the new RAM based indexing in Lucene available here.  And I thought Lucene was already fast…  Beware, though, this fix isn’t officially released, so you will need to use the trunk version.

Part 2 of IBM developerWorks article on Solr

Part 2 of my 2 part series on Apache Solr is now up on IBM developerWorks. You can read it here. This article covers some of the things that makes Solr great for the enterprise, like caching, replication and easy administration.