<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Grant's Grunts: Lucene Edition &#187; Solr</title>
	<atom:link href="http://lucene.grantingersoll.com/category/solr/feed/" rel="self" type="application/rss+xml" />
	<link>http://lucene.grantingersoll.com</link>
	<description>Thoughts on Apache Lucene, Mahout, Solr, Tika and Nutch</description>
	<lastBuildDate>Wed, 18 Jan 2012 13:33:40 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Berlin Buzzwords 2012</title>
		<link>http://lucene.grantingersoll.com/2012/01/18/berlin-buzzwords-2012/</link>
		<comments>http://lucene.grantingersoll.com/2012/01/18/berlin-buzzwords-2012/#comments</comments>
		<pubDate>Wed, 18 Jan 2012 13:33:40 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=467</guid>
		<description><![CDATA[In case you haven&#8217;t heard, and are in Europe this June (or want to be), you should check out the Berlin Buzzwords conference.  It&#8217;s a great conference for all things related to Lucene, Solr, Hadoop, Mahout, NoSQL and generally scaling.  The CFP is open now through March 11.]]></description>
			<content:encoded><![CDATA[<p>In case you haven&#8217;t heard, and are in Europe this June (or want to be), you should check out the <a href="http://www.berlinbuzzwords.de">Berlin Buzzwords</a> conference.  It&#8217;s a great conference for all things related to Lucene, Solr, Hadoop, Mahout, NoSQL and generally scaling.  The CFP is open now through March 11.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2012/01/18/berlin-buzzwords-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Taming Text Update</title>
		<link>http://lucene.grantingersoll.com/2011/12/27/taming-text-update/</link>
		<comments>http://lucene.grantingersoll.com/2011/12/27/taming-text-update/#comments</comments>
		<pubDate>Tue, 27 Dec 2011 13:45:39 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[OpenNLP]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Taming Text]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=460</guid>
		<description><![CDATA[Drew, Tom and I are feverishly working away on finishing up Taming Text.  We are currently in the process of addressing the feedback we got from our final review and should have updates up soon.  I have also posted all of the book&#8217;s source code up on Github under the Taming Text user.  The source includes, [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" title="Taming Text book cover" src="http://manning.com/ingersoll/ingersoll_cover150.jpg" alt="" width="150" height="188" /></p>
<p>Drew, Tom and I are feverishly working away on finishing up <a href="http://www.manning.com/affiliate/idevaffiliate.php?id=1069_148">Taming Text</a>.  We are currently in the process of addressing the feedback we got from our final review and should have updates up soon.  I have also posted all of the book&#8217;s source code up on Github under the <a href="http://www.github.com/tamingtext">Taming Text user</a>.  The source includes, amongst other things, a simple Question Answering system using Solr and OpenNLP, as well as analyzers for Lucene that use OpenNLP for sentence detection, part of speech tagging and Named Entity Recognition.  As with most books, these examples are meant to be just that, examples.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/12/27/taming-text-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mahout and Other News</title>
		<link>http://lucene.grantingersoll.com/2011/08/05/mahout-and-other-news/</link>
		<comments>http://lucene.grantingersoll.com/2011/08/05/mahout-and-other-news/#comments</comments>
		<pubDate>Fri, 05 Aug 2011 20:41:35 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=438</guid>
		<description><![CDATA[After some time away, I&#8217;m happy to have had some time recently to work on Mahout again.  Lots of goodness all over the place happening there that I&#8217;ll leave to others to explain while I focus in on a few recent things I&#8217;ve been doing. First off, I was doing a fair amount of work [...]]]></description>
			<content:encoded><![CDATA[<p>After some time away, I&#8217;m happy to have had some time recently to work on Mahout again.  Lots of goodness all over the place happening there that I&#8217;ll leave to others to explain while I focus in on a few recent things I&#8217;ve been doing.</p>
<p>First off, I was doing a fair amount of work calculating document similarities across whole collections using, at first, the RowSimilarityJob and later a map-side simplification I wrote that uses the distributed cache called the VectorDistanceSimilarityJob.  Both of these come in handy when one wants to calculate pairwise-similarity between all (or most) items in a collection.  The original Mahout implementation was focused on providing recommendations, but as outlined in the <a href="http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf">Elsayed, Lin and Oard paper</a>, it is quite useful for text as well in cases where one wants to precompute &#8220;more like this&#8221; for all documents.  As for the need for two similar approaches, see the discussion at <a href="http://www.lucidimagination.com/search/document/40c4f124795c6b5/rowsimilarity_s#42ab816c27c6a9e7">http://www.lucidimagination.com/search/document/40c4f124795c6b5/rowsimilarity_s#42ab816c27c6a9e7</a>.  In essence, it boils down to I didn&#8217;t need a fully generic implementation that was a bit slower on larger matrices since I mainly wanted to compare all my vectors in HDFS against a subset of &#8220;core&#8221; vectors that fit into memory.  That being said, <a href="http://ssc.io/rowsimilarityjob-on-steroids/">Sebastian</a> is already hard at work on making the more generic version perform better when certain distance measures are used while still offering the full suite of capabilities of the existing RowSimilarityJob.  See <a href="https://issues.apache.org/jira/browse/MAHOUT-767">MAHOUT-767</a> for more info on that work.</p>
<p>Now, I&#8217;m looking into some more pruning techniques via <a href="https://issues.apache.org/jira/browse/MAHOUT-688">MAHOUT-688</a>.  After that quick patch, I think I&#8217;m going to dig in a bit more to recommendations as well as run some tests on the ASF mail archives I posted a while back (see below for an update).</p>
<p>Also, I&#8217;ve switched to using Git and Github for managing my Mahout changes (as well as other work), so if you want to see what I&#8217;m up to, <a href="https://github.com/gsingers/">check out my Github</a> account.</p>
<p>It&#8217;s not complete yet, but the ASF Public Mail archive I put up <a href="https://s3.amazonaws.com/asf-mail-archives/index.html">last September</a> on Amazon AWS is getting a fresh new version.  The interim solution is available at <a href="https://s3.amazonaws.com/asf-mail-archives-7-18-2011/index.html">https://s3.amazonaws.com/asf-mail-archives-7-18-2011/index.html</a>, but look for it to be a <a href="http://aws.amazon.com/datasets">Public Data Set</a> hosted by Amazon soon.  The September version of this data contained roughly 6.7M emails sent to the public mailing lists at the Apache Software Foundation, so I suspect this version has somewhere in the 7M+ item range, but I haven&#8217;t counted them.  At any rate, I hope it is useful to people.</p>
<p>Finally, on a personal note, I&#8217;m back at <a href="http://www.lucidimagination.com">Lucid Imagination</a> after a brief move elsewhere, this time in a new role as Chief Scientist.  Lucid is a company I co-founded and helped build up for the past 4 years.  I&#8217;m looking forward to be back working closely with Lucene and Solr again and a <a href="http://www.lucidimagination.com/why-lucid/leadership">top notch technical team</a>.  I&#8217;m also looking forward to working on Mahout more, as well as other technologies like Hadoop, Pig, HBase and the like, especially as they relate to search and recommendations.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/08/05/mahout-and-other-news/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Stump the Chump</title>
		<link>http://lucene.grantingersoll.com/2011/04/24/stump-the-chump/</link>
		<comments>http://lucene.grantingersoll.com/2011/04/24/stump-the-chump/#comments</comments>
		<pubDate>Sun, 24 Apr 2011 06:09:39 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=428</guid>
		<description><![CDATA[I&#8217;m on the hot seat for &#8220;Stump the Chump&#8221; this year at Lucene Revolution, so if you have questions you want me to tackle, please either show up at my talk or email them to info@lucenerevolution.org.  See Session Abstracts &#124; Day 1 &#124; www.lucenerevolution.org for more information.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m on the hot seat for &#8220;Stump the Chump&#8221; this year at Lucene Revolution, so if you have questions you want me to tackle, please either show up at my talk or email them to info@lucenerevolution.org.  See <a href="http://lucenerevolution.org/2011/sessions-day-1#stump-ingersoll">Session Abstracts | Day 1 | www.lucenerevolution.org</a> for more information.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/04/24/stump-the-chump/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apache Lucene 3.1.0 and Apache Solr 3.1.0</title>
		<link>http://lucene.grantingersoll.com/2011/03/31/apache-lucene-3-1-0-and-apache-solr-3-1-0/</link>
		<comments>http://lucene.grantingersoll.com/2011/03/31/apache-lucene-3-1-0-and-apache-solr-3-1-0/#comments</comments>
		<pubDate>Thu, 31 Mar 2011 18:37:35 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=424</guid>
		<description><![CDATA[I just sent out to the mailing lists the official release announcements for Lucene and Solr 3.1.0 as well as posted over on the Lucid Imagination site the release announcement, etc. Lucid Imagination » Apache Lucene 3.1.0 and Apache Solr 3.1.0. Thanks to everyone for all their hard work in making these releases happen.]]></description>
			<content:encoded><![CDATA[<p>I just sent out to the mailing lists the official release announcements for Lucene and Solr 3.1.0 as well as posted over on the Lucid Imagination site the release announcement, etc.</p>
<p><a href="http://www.lucidimagination.com/blog/2011/03/31/apache-lucene-3-1-0-and-apache-solr-3-1-0/">Lucid Imagination » Apache Lucene 3.1.0 and Apache Solr 3.1.0</a>.</p>
<p>Thanks to everyone for all their hard work in making these releases happen.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/03/31/apache-lucene-3-1-0-and-apache-solr-3-1-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lucene, Solr and SXSW</title>
		<link>http://lucene.grantingersoll.com/2011/03/17/lucene-solr-and-sxsw/</link>
		<comments>http://lucene.grantingersoll.com/2011/03/17/lucene-solr-and-sxsw/#comments</comments>
		<pubDate>Thu, 17 Mar 2011 12:16:42 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[SXSW]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=418</guid>
		<description><![CDATA[Finally back in the saddle from SXSW, where a good time was had by all, AFAICT.  It was my first time, so it was a bit overwhelming.  So many people, so much hype, so much to do and see.  Suffice it to say, most of the hype was about social media.  Can you say Bubble?  [...]]]></description>
			<content:encoded><![CDATA[<p>Finally back in the saddle from <a href="http://www.sxsw.com">SXSW</a>, where a good time was had by all, AFAICT.  It was my first time, so it was a bit overwhelming.  So many people, so much hype, so much to do and see.  Suffice it to say, most of the hype was about social media.  Can you say Bubble?  Sure there are some winners in the bunch, but there&#8217;s a whole lot of people whose only idea is simply to copy some other venture funded company who has yet to make a profit or to create Yet Another Social Media Monitoring Solution (YAMMS).  From the engineering side of these companies, this stuff is incredibly cool and I totally get the excitement.  What&#8217;s not to like about working on massively scalable problems with incredible update rates using cool new technology like <a href="http://hadoop.apache.org">Hadoop</a>, <a href="http://cassandra.apache.org">Cassandra</a>, <a href="http://mahout.apache.org">Mahout</a>, etc?  From the business side, color me cynical.</p>
<p>Regardless of the hype, I had a good time.  Obviously I&#8217;m biased, but I think the talk RC Johnson and I gave on using Solr in NoSQL situations went well, was well attended and <a href="http://twitter.com/#search?q=%23solrnosql">well received</a>.  RC and I split duties during the talk.  I covered the big picture around Solr as a NoSQL solution and RC covered how <a href="http://www.bazaarvoice.com/">Bazaarvoice</a> uses Solr to serve billions of searches/lookups per month.  A good chunk of the audience was already familiar with NoSQL and Lucene/Solr, so it was a pretty technical audience, which is exactly what we hoped for.   We also had some good questions during the talk as well as afterward.  The <a href="http://portal.sliderocket.com/ANYSX/SXSW-2011-Solr-Nosql">slides are up on SlideRocket</a>.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/03/17/lucene-solr-and-sxsw/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lucid Imagination » Introducing Unlimited Developer Access to LucidWorks Enterprise</title>
		<link>http://lucene.grantingersoll.com/2010/12/15/lucid-imagination-%c2%bb-introducing-unlimited-developer-access-to-lucidworks-enterprise/</link>
		<comments>http://lucene.grantingersoll.com/2010/12/15/lucid-imagination-%c2%bb-introducing-unlimited-developer-access-to-lucidworks-enterprise/#comments</comments>
		<pubDate>Wed, 15 Dec 2010 14:36:06 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=407</guid>
		<description><![CDATA[Lucid Imagination » Introducing Unlimited Developer Access to LucidWorks Enterprise. Our latest version of LucidWorks Enterprise is now available for general access.]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.lucidimagination.com/blog/2010/12/14/introducing-unlimited-developer-access-to-lucidworks-enterprise/">Lucid Imagination » Introducing Unlimited Developer Access to LucidWorks Enterprise</a>.</p>
<p>Our latest version of LucidWorks Enterprise is now available for general access.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2010/12/15/lucid-imagination-%c2%bb-introducing-unlimited-developer-access-to-lucidworks-enterprise/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Next up, ApacheCon!</title>
		<link>http://lucene.grantingersoll.com/2010/10/09/next-up-apachecon/</link>
		<comments>http://lucene.grantingersoll.com/2010/10/09/next-up-apachecon/#comments</comments>
		<pubDate>Sat, 09 Oct 2010 11:30:04 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=395</guid>
		<description><![CDATA[If you missed the Lucene Revolution conference, especially the training on Apache Solr and Lucene, don&#8217;t worry.  You won&#8217;t have to wait long for the next round, both Erik Hatcher (co-author of Lucene In Action) and I will be giving trainings at ApacheCon in Atlanta in November. To learn more and sign up, see: Lucene [...]]]></description>
			<content:encoded><![CDATA[<p>If you missed the <a href="http://www.lucenerevolution.com">Lucene Revolution</a> conference, especially the training on Apache Solr and Lucene, don&#8217;t worry.  You won&#8217;t have to wait long for the next round, both Erik Hatcher (co-author of <a href="http://www.manning.com/affiliate/idevaffiliate.php?id=1069_147">Lucene In Action</a>) and I will be giving trainings at <a href="http://www.apachecon.com">ApacheCon</a> in Atlanta in November.</p>
<p>To learn more and sign up, see:</p>
<ol>
<li><a href="http://na.apachecon.com/c/acna2010/sessions/615">Lucene Boot Camp</a></li>
<li><a href="http://na.apachecon.com/c/acna2010/sessions/641">Solr Application Development</a></li>
</ol>
<p>There will also be almost two days of talks on Lucene, Solr, Mahout and friends.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2010/10/09/next-up-apachecon/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Upcoming Apache Lucene and Solr public trainings</title>
		<link>http://lucene.grantingersoll.com/2010/08/30/upcoming-apache-lucene-and-solr-public-trainings/</link>
		<comments>http://lucene.grantingersoll.com/2010/08/30/upcoming-apache-lucene-and-solr-public-trainings/#comments</comments>
		<pubDate>Mon, 30 Aug 2010 15:33:02 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Lucene Boot Camp]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=387</guid>
		<description><![CDATA[Erik Hatcher and I are once again offering our Lucene and Solr training classes, but this time there are two opportunities to participate.  The first will be at Lucene Revolution on October 5 and 6.  The second is on Nov. 1st and 2nd at ApacheCon NA 2010.  Both classes are designed to get people up [...]]]></description>
			<content:encoded><![CDATA[<p>Erik Hatcher and I are once again offering our Lucene and Solr training classes, but this time there are two opportunities to participate.  The first will be at <a href="http://www.lucenerevolution.com">Lucene Revolution</a> on October 5 and 6.  The second is on Nov. 1st and 2nd at <a href="http://www.apachecon.com">ApacheCon NA 2010</a>.  Both classes are designed to get people up to speed on either Lucene or Solr as quickly as possible.  If you have any questions, feel free to drop me an email at trainer@lucenebootcamp.com.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2010/08/30/upcoming-apache-lucene-and-solr-public-trainings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lucid Imagination » Free Webinar: How Cisco’s Pulse uses Lucene/Solr to put Social Networks to Work</title>
		<link>http://lucene.grantingersoll.com/2010/06/18/lucid-imagination-%c2%bb-free-webinar-how-cisco%e2%80%99s-pulse-uses-lucenesolr-to-put-social-networks-to-work/</link>
		<comments>http://lucene.grantingersoll.com/2010/06/18/lucid-imagination-%c2%bb-free-webinar-how-cisco%e2%80%99s-pulse-uses-lucenesolr-to-put-social-networks-to-work/#comments</comments>
		<pubDate>Fri, 18 Jun 2010 12:25:34 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Lucid Imagination]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=374</guid>
		<description><![CDATA[Lucid Imagination » Free Webinar: How Cisco’s Pulse uses Lucene/Solr to put Social Networks to Work. &#8216;Nuff said.]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.lucidimagination.com/blog/2010/06/11/free-webinar-how-ciscos-pulse-uses-lucenesolr-to-put-social-networks-to-work/">Lucid Imagination » Free Webinar: How Cisco’s Pulse uses Lucene/Solr to put Social Networks to Work</a>.</p>
<p>&#8216;Nuff said.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2010/06/18/lucid-imagination-%c2%bb-free-webinar-how-cisco%e2%80%99s-pulse-uses-lucenesolr-to-put-social-networks-to-work/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

