Archive for the 'Indexing' Category
I just posted a brief intro on getting started with Apache Lucene payloads on Lucid’s blog for those who are interested. Here’s the teaser: Like Spans, payloads involve the position of terms, but go one step further. Namely, a Payload in Apache Lucene is an arbitrary byte array stored at a specific position (i.e. a [...]
August 5th, 2009 | Posted in Apache, Indexing, Lucene, Search, Solr | No Comments
What’s new with Apache Solr. My latest article on Apache Solr, title “What’s New with Apache Solr” is now available over at IBM developerWorks. It covers some of the new features like spell checking, Data Import Handler, distributed search, editorial results placement (a.k.a. “paid placement”), SolrJ and a variety of other pieces. Hope it is [...]
November 5th, 2008 | Posted in Indexing, Java, Lucene, Search, Solr, spell checking | 1 Comment
Just a quick reminder that there is just over one week left before Lucene Boot Camp at this year’s ApacheCon. This year, it is a 2 day training, but for those who want to, they can sign up for the first day of Lucene Boot Camp, and then attend Solr Boot Camp on the second [...]
October 23rd, 2008 | Posted in Apache, ApacheCon, Indexing, Java, Lucene, Lucene Boot Camp, Search, Solr | 4 Comments
Lucene Boot Camp (ApacheCon site) Lucene Boot Camp (http://www.lucenebootcamp.com) is scheduled this year for ApacheCon US on November 3 and 4th in New Orleans. This year, I am doing a two day event, as I felt the one day event was just not enough time to get in all the goodness that is Lucene (not [...]
August 20th, 2008 | Posted in Apache, ApacheCon, Indexing, Java, Lucene, Lucene Boot Camp, Lucid Imagination, Search | No Comments
Code Fury The author of this nice plugin for WordPress contacted me today about his Lucene based WordPress plugin, so I thought I would give it a try, as I’m obviously a big fan of Lucene and also never much cared for MySql’s search (in)capabilities. The plugin is easy enough to install, only thing that [...]
August 7th, 2008 | Posted in Indexing, Lucene, MySQL, Search, spell checking, wpSearch | No Comments
So, I was indexing a 10+ million records in MySQL into Solr and kept coming across the following odd MySQL exception: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure Last packet sent to the server was 4467745 ms ago … com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1074) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2985) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871) at In my code, I loop over a JDBC ResultSet and add the records [...]
July 16th, 2008 | Posted in database, Indexing, Java, JDBC, Lucene, MySQL, Solr | 6 Comments
Why Lucene Isn’t That Good | Javalobby Patches welcome… I know that is an old saw, but that is the only way it’s going to get better. There are some good points in here, and some stuff that is a bit dramatic. We do try to keep adapting Lucene and make it better, but in [...]
March 28th, 2008 | Posted in Apache, Indexing, Lucene, Search | No Comments
How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data | High Scalability Nice article on how the Lucene/Hadoop/Solr stack was used to solve a really big problem. Someday, I hope (when we have actual code), they can add Mahout to the equation and do even more interesting things with the data.
February 1st, 2008 | Posted in Apache, database, Hadoop, Indexing, Java, Lucene, Mahout, Search, Solr | No Comments
Coderspiel / January 2008 I hardly think Lucene is creating an isolationist culture, nor do we think our project is perfect. What we do agree on is that our time is better spent on figuring out how to make Lucene better, not how to spend our time doing UNIX administration in a virtual server environment. [...]
January 21st, 2008 | Posted in Indexing, Java, Lucene, Search | No Comments
Coderspiel / The right tool for the slob This guy’s comment system wasn’t working at the moment, so I will leave my comment here. This won’t make much sense without reading the post first: It’s funny you mention Wikipedia as an example, since they are running Lucene. As is Technorati and the Internet Archive. As [...]
January 19th, 2008 | Posted in Apache, Indexing, Java, Lucene, Nutch, Search, Solr | 3 Comments