Archive for the 'Indexing' Category

Lucid Imagination » Getting Started with Payloads

I just posted a brief intro on getting started with Apache Lucene payloads on Lucid’s blog for those who are interested.  Here’s the teaser: Like Spans, payloads involve the position of terms, but go one step further. Namely, a Payload in Apache Lucene is an arbitrary byte array stored at a specific position (i.e. a [...]

“What’s new with Apache Solr” now available at IBM developerWorks

What’s new with Apache Solr. My latest article on Apache Solr, title “What’s New with Apache Solr” is now available over at IBM developerWorks.  It covers some of the new features like spell checking, Data Import Handler, distributed search, editorial results placement (a.k.a. “paid placement”), SolrJ and a variety of other pieces. Hope it is [...]

Lucene Boot Camp at ApacheCon US 2008

Just a quick reminder that there is just over one week left before Lucene Boot Camp at this year’s ApacheCon. This year, it is a 2 day training, but for those who want to, they can sign up for the first day of Lucene Boot Camp, and then attend Solr Boot Camp on the second [...]

Lucene Boot Camp at ApacheCon US

Lucene Boot Camp (ApacheCon site) Lucene Boot Camp (http://www.lucenebootcamp.com) is scheduled this year for ApacheCon US on November 3 and 4th in New Orleans.  This year, I am doing a two day event, as I felt the one day event was just not enough time to get in all the goodness that is Lucene (not [...]

wpSearch – Lucene search for WordPress

Code Fury The author of this nice plugin for WordPress contacted me today about his Lucene based WordPress plugin, so I thought I would give it a try, as I’m obviously a big fan of Lucene and also never much cared for MySql’s search (in)capabilities. The plugin is easy enough to install, only thing that [...]

MySQL, Solr and “Communications link failure”

So, I was indexing a 10+ million records in MySQL into Solr and kept coming across the following odd MySQL exception: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure Last packet sent to the server was 4467745 ms ago … com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1074) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2985) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871) at In my code, I loop over a JDBC ResultSet and add the records [...]

Why Lucene Isn’t That Good | Javalobby

Why Lucene Isn’t That Good | Javalobby Patches welcome…  I know that is an old saw, but that is the only way it’s going to get better. There are some good points in here, and some stuff that is a bit dramatic. We do try to keep adapting Lucene and make it better, but in [...]

How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data | High Scalability

How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data | High Scalability Nice article on how the Lucene/Hadoop/Solr stack was used to solve a really big problem.  Someday, I hope (when we have actual code),  they can add Mahout to the equation and do even more interesting things with the data.

Coderspiel / January 2008

Coderspiel / January 2008 I hardly think Lucene is creating an isolationist culture, nor do we think our project is perfect.  What we do agree on is that our time is better spent on figuring out how to make Lucene better, not how to spend our time doing UNIX administration in a virtual server environment.  [...]

Coderspiel / The right tool for the slob

Coderspiel / The right tool for the slob This guy’s comment system wasn’t working at the moment, so I will leave my comment here. This won’t make much sense without reading the post first: It’s funny you mention Wikipedia as an example, since they are running Lucene. As is Technorati and the Internet Archive. As [...]