<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Grant's Grunts: Lucene Edition &#187; JDBC</title>
	<atom:link href="http://lucene.grantingersoll.com/category/jdbc/feed/" rel="self" type="application/rss+xml" />
	<link>http://lucene.grantingersoll.com</link>
	<description>Thoughts on Apache Lucene, Mahout, Solr, Tika and Nutch</description>
	<lastBuildDate>Thu, 08 Jul 2010 17:23:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>MySQL, Solr and &#8220;Communications link failure&#8221;</title>
		<link>http://lucene.grantingersoll.com/2008/07/16/mysql-solr-and-communications-link-failure/</link>
		<comments>http://lucene.grantingersoll.com/2008/07/16/mysql-solr-and-communications-link-failure/#comments</comments>
		<pubDate>Wed, 16 Jul 2008 20:02:19 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Indexing]]></category>
		<category><![CDATA[JDBC]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[database]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=87</guid>
		<description><![CDATA[So, I was indexing a 10+ million records in MySQL into Solr and kept coming across the following odd MySQL exception: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure Last packet sent to the server was 4467745 ms ago ... com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1074) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2985) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871) at In my code, I loop over a JDBC ResultSet and add the records [...]]]></description>
			<content:encoded><![CDATA[<p>So, I was indexing a 10+ million records in MySQL into Solr and kept coming across the following odd MySQL exception:</p>
<pre><tt><tt>com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications
link failure
Last packet sent to the server was 4467745 ms ago
...
</tt></tt><tt><tt>
<pre>com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1074)
	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2985) 	at
com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871) 	at</pre>
<p></tt></tt></pre>
<p>In my code, I loop over a JDBC ResultSet and add the records to Solr per the Solr field schema, mapping columns to fields, etc.  This would happen after getting through something like 9M+ records.  After some tracking down, hypothesizing, talking with others, we came to the conclusion that the issue was a combination of having the autocommit value in Solr set and MySQL timing out the ResultSet, such that when Lucene had to do a large merge (even in the background), Solr had to wait for said merge to finish, thus keeping the ResultSet open too long w/o activity.  Now, these large merges can take some time.  They can happen in the background, but Solr can&#8217;t refresh it&#8217;s IndexReader until the merge finishes, AIUI.  Thus, we&#8217;re stuck in the middle of a ResultSet loop, holding the cursor open past MySQL&#8217;s default setting (600 seconds, more on that later), causing MySQL to kill the connection, and rightfully so.  On the MySQL side of things, we are streaming the results, since it&#8217;s JDBC driver does not support setFetchSize() (ugh!).  As it turns out MySQL has a Streaming timeout value named <strong>netTimeoutForStreamingResults</strong> (see <a href="http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-configuration-properties.html">http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-configuration-properties.html</a>) which defaults to 600 seconds.</p>
<p>Long story short, I have at least two options:</p>
<ol>
<li>Turn off autocommit, meaning user&#8217;s won&#8217;t be able to see documents as soon as they may like</li>
<li>Increase the netTimoutForStreamingResults value.  This is great for MySQL and I have verified it works, but is not a generic value for other DBs, which our code supports</li>
</ol>
<p>I am still deciding on what to do, and also thinking of some other options that can decouple DB retrieval from the indexing process.  At any rate, I wanted to post the cause of my seeing this exception, because I did not see anyone else with this exception whose cause was due to a timeout during ResultSet processing and hopefully it will save them some time.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2008/07/16/mysql-solr-and-communications-link-failure/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
