Archive for the 'Performance' Category
Yahoo Search Wants to Be More Like Google, Embraces Hadoop Hadoop is an open-source implementation of Google’s MapReduce software and file system. It takes all the links on the Web found by a search engine’s crawlers and “reduces” them to a map of the Web so that ranking algorithms can be run against them. Ahem, [...]
February 20th, 2008 | Posted in Apache, Hadoop, Java, machine learning, Mahout, Map Reduce, Performance | No Comments
The Two Flavors of Google Nice article on Hadoop at Business Week and some good quotes from Lucene Java creator Doug Cutting. Hadoop holds a lot of promise for the future of large scale computing, although I don’t want to burden it with that kind of claim, either. In the end, it makes solving a [...]
December 26th, 2007 | Posted in Hadoop, Java, Lucene, machine learning, Mahout, Performance | No Comments
ApacheCon EU 2008 Schedule is out for ApacheCon Europe. I will be doing my Lucene Boot Camp training and a Lucene Performance talk. Erik Hatcher will also be doing a Solr Boot Camp and a Lucene/Solr talk. There will also be some Hadoop talks.
December 4th, 2007 | Posted in ApacheCon, Europe, Java, Lucene, Performance, Solr | No Comments
Interesting comparison of open source search engines available at http://wrg.upf.edu/WRG/dctos/Middleton-Baeza.pdf. While it reflects OK on Lucene (hey, we can’t be perfect at everything,) I am interested in finding out more details about what settings were used for indexing. If they just used the out of the box settings, then I would argue that they need [...]
December 4th, 2007 | Posted in Indexing, Java, Lucene, Performance, Search | 2 Comments
Lots of good things happening in Lucene land lately, all of which should benefit users with faster indexing and searching capabilities. Most notably, Lucene 2.3 (hopefully released this quarter) has some major changes in indexing memory management and performance. I have personally clocked indexing using release 2.2 at about 400 rec/s (single threaded, Mac Pro [...]
November 2nd, 2007 | Posted in Indexing, Java, Lucene, Performance, Search, term vectors | No Comments
Welcome to the Triangle Java Users Group I will be speaking November 19, 2007 at the Triangle Java Users Group on Lucene and Solr. The talk will be an introduction to the features and capabilities of both Lucene and Solr, as well as some basic compare and contrast information.
September 7th, 2007 | Posted in Cary, Chapel Hill, Durham, Indexing, Java, Lucene, North Carolina, payloads, Performance, Raleigh, Search, Solr, Triangle | No Comments
Looks like they have put up the ApacheCon Atlanta schedule. As usual, there looks to be several very good talks covering Lucene and Solr, including talks by Chris Hostetter, Ken Krugler, Michael Busch and yours truly. My talk is at 3pm on November 16, details are here. I will also be leading my “Lucene Boot [...]
August 7th, 2007 | Posted in ApacheCon, Indexing, Java, Lucene, Performance, Search, Solr | No Comments
ImproveIndexingSpeed – Lucene-java Wiki People might find the indexing speed tips here useful
July 19th, 2007 | Posted in Indexing, Lucene, Performance | No Comments
Nice discussion on tuning the new RAM based indexing in Lucene available here. And I thought Lucene was already fast… Beware, though, this fix isn’t officially released, so you will need to use the trunk version.
July 13th, 2007 | Posted in Indexing, Java, Lucene, Performance | No Comments
Part 2 of my 2 part series on Apache Solr is now up on IBM developerWorks. You can read it here. This article covers some of the things that makes Solr great for the enterprise, like caching, replication and easy administration.
June 6th, 2007 | Posted in Java, Lucene, Performance, Solr | 2 Comments