MySQL, Solr and “Communications link failure”

So, I was indexing a 10+ million records in MySQL into Solr and kept coming across the following odd MySQL exception:

com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications
link failure
Last packet sent to the server was 4467745 ms ago
...

com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1074)
	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2985) 	at
com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871) 	at

In my code, I loop over a JDBC ResultSet and add the records to Solr per the Solr field schema, mapping columns to fields, etc.  This would happen after getting through something like 9M+ records.  After some tracking down, hypothesizing, talking with others, we came to the conclusion that the issue was a combination of having the autocommit value in Solr set and MySQL timing out the ResultSet, such that when Lucene had to do a large merge (even in the background), Solr had to wait for said merge to finish, thus keeping the ResultSet open too long w/o activity.  Now, these large merges can take some time.  They can happen in the background, but Solr can’t refresh it’s IndexReader until the merge finishes, AIUI.  Thus, we’re stuck in the middle of a ResultSet loop, holding the cursor open past MySQL’s default setting (600 seconds, more on that later), causing MySQL to kill the connection, and rightfully so.  On the MySQL side of things, we are streaming the results, since it’s JDBC driver does not support setFetchSize() (ugh!).  As it turns out MySQL has a Streaming timeout value named netTimeoutForStreamingResults (see http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-configuration-properties.html) which defaults to 600 seconds.

Long story short, I have at least two options:

  1. Turn off autocommit, meaning user’s won’t be able to see documents as soon as they may like
  2. Increase the netTimoutForStreamingResults value.  This is great for MySQL and I have verified it works, but is not a generic value for other DBs, which our code supports

I am still deciding on what to do, and also thinking of some other options that can decouple DB retrieval from the indexing process.  At any rate, I wanted to post the cause of my seeing this exception, because I did not see anyone else with this exception whose cause was due to a timeout during ResultSet processing and hopefully it will save them some time.

5 Responses to “MySQL, Solr and “Communications link failure””

  1. I avoid this by running my queries with limits while I reindex. So, you might query 100 rows at a time from your db, index them, then query the next 100 rows. You could also split up the indexing work across many machines with this approach then merge your index together after each separate process is finished. Or maybe use the new federated index features from solr. Happy indexing!

  2. We use a queue (MSMQ) between database and Lucene, as well as between web crawlers and Lucene, that way indexing is asynchronous and you never get any such time out errors. Load up the queue from the database, and let SOLR index from the queue. You only need a small service to watch for new items in queue and submit to SOLR.

  3. This is one of the reasons DataImportHandler re-opens connections inactive for more than 10 seconds because there is no JDBC driver-independent way of managing time-outs.

  4. grant_ingersoll on July 18th, 2008 at 11:55 am

    Bob, I am looking into the queue approach, as that seems most useful. Shalin, you are definitely right on connection timeouts, but this was on the ResultSet, so you’d have to reexecute the query (or automatically page, as Archie suggests) I think.

  5. I ran in to the same problem with LuSql.

    Originally I solved it with paging, taking something like 5000 records at a time.
    However, for large ResultSets, MySQL starts taking longer and longer with each subsequent query as it seems to run through the records on the server end. After ~500,000 this starts being 10s of seconds…

    I settled on counting the records as they go by, catching the exception when it is thrown, then re-issuing the original query, with a limit offset to the last count.

Leave a Reply

*
To prove that you're not a bot, enter this code
Anti-Spam Image


proxy surf proxy mp3 site mp3 sites Allofmp3 Mp3 fiesta Mp3fiesta buy mp3 music vpn usa vpn uk vpn vpn account watch usa tv