Archive for the 'Java' Category
How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data | High Scalability
Nice article on how the Lucene/Hadoop/Solr stack was used to solve a really big problem. Someday, I hope (when we have actual code), they can add Mahout to the equation and do even more interesting things with the data.
February 1st, 2008 | Posted in Apache, Hadoop, Indexing, Java, Lucene, Mahout, Search, Solr, database | No Comments
Apache Mahout - Overview
It’s official! Mahout is now an official subproject of Lucene at the Apache Software Foundation. Mahout’s goal is to create a suite of practical, scalable machine learning libraries.
January 25th, 2008 | Posted in Apache, Java, Mahout, machine learning | No Comments
Coderspiel / January 2008
I hardly think Lucene is creating an isolationist culture, nor do we think our project is perfect. What we do agree on is that our time is better spent on figuring out how to make Lucene better, not how to spend our time doing UNIX administration in a virtual server environment. As [...]
January 21st, 2008 | Posted in Indexing, Java, Lucene, Search | No Comments
Coderspiel / The right tool for the slob
This guy’s comment system wasn’t working at the moment, so I will leave my comment here. This won’t make much sense without reading the post first:
It’s funny you mention Wikipedia as an example, since they are running Lucene. As is Technorati and the Internet Archive. [...]
January 19th, 2008 | Posted in Apache, Indexing, Java, Lucene, Nutch, Search, Solr | 2 Comments
Lucene: A Tacit Admission of Fail? : ob.blog
If ob.blog gave this post a moment of thought before sharing it with the world, he would quickly realize that hosting a live, 24/7, high traffic site takes more than just some code that one can execute. Perhaps ob.blog is volunteering to donate computing resources, support staff, [...]
January 18th, 2008 | Posted in Java, Lucene, Search | No Comments
The Two Flavors of Google
Nice article on Hadoop at Business Week and some good quotes from Lucene Java creator Doug Cutting. Hadoop holds a lot of promise for the future of large scale computing, although I don’t want to burden it with that kind of claim, either. In the end, it makes solving a good [...]
December 26th, 2007 | Posted in Hadoop, Java, Lucene, Mahout, Performance, machine learning | No Comments
ApacheCon EU 2008
Schedule is out for ApacheCon Europe. I will be doing my Lucene Boot Camp training and a Lucene Performance talk. Erik Hatcher will also be doing a Solr Boot Camp and a Lucene/Solr talk. There will also be some Hadoop talks.
December 4th, 2007 | Posted in ApacheCon, Europe, Java, Lucene, Performance, Solr | No Comments
Interesting comparison of open source search engines available at http://wrg.upf.edu/WRG/dctos/Middleton-Baeza.pdf. While it reflects OK on Lucene (hey, we can’t be perfect at everything,) I am interested in finding out more details about what settings were used for indexing. If they just used the out of the box settings, then I would argue that they need [...]
December 4th, 2007 | Posted in Indexing, Java, Lucene, Performance, Search | 2 Comments
I have setup a new site to support my Lucene Boot Camp training. Check it out at http://lucenebootcamp.com. From there, you can download training setup information, read the class outline, etc.
November 6th, 2007 | Posted in ApacheCon, Indexing, Java, Lucene, Search | No Comments
Lots of good things happening in Lucene land lately, all of which should benefit users with faster indexing and searching capabilities. Most notably, Lucene 2.3 (hopefully released this quarter) has some major changes in indexing memory management and performance. I have personally clocked indexing using release 2.2 at about 400 rec/s (single threaded, Mac Pro [...]
November 2nd, 2007 | Posted in Indexing, Java, Lucene, Performance, Search, term vectors | No Comments