<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Grant's Grunts: Lucene Edition &#187; Lucene</title>
	<atom:link href="http://lucene.grantingersoll.com/category/lucene/feed/" rel="self" type="application/rss+xml" />
	<link>http://lucene.grantingersoll.com</link>
	<description>Thoughts on Apache Lucene, Mahout, Solr, Tika and Nutch</description>
	<lastBuildDate>Wed, 18 Jan 2012 13:33:40 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Berlin Buzzwords 2012</title>
		<link>http://lucene.grantingersoll.com/2012/01/18/berlin-buzzwords-2012/</link>
		<comments>http://lucene.grantingersoll.com/2012/01/18/berlin-buzzwords-2012/#comments</comments>
		<pubDate>Wed, 18 Jan 2012 13:33:40 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=467</guid>
		<description><![CDATA[In case you haven&#8217;t heard, and are in Europe this June (or want to be), you should check out the Berlin Buzzwords conference.  It&#8217;s a great conference for all things related to Lucene, Solr, Hadoop, Mahout, NoSQL and generally scaling.  The CFP is open now through March 11.]]></description>
			<content:encoded><![CDATA[<p>In case you haven&#8217;t heard, and are in Europe this June (or want to be), you should check out the <a href="http://www.berlinbuzzwords.de">Berlin Buzzwords</a> conference.  It&#8217;s a great conference for all things related to Lucene, Solr, Hadoop, Mahout, NoSQL and generally scaling.  The CFP is open now through March 11.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2012/01/18/berlin-buzzwords-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Taming Text Update</title>
		<link>http://lucene.grantingersoll.com/2011/12/27/taming-text-update/</link>
		<comments>http://lucene.grantingersoll.com/2011/12/27/taming-text-update/#comments</comments>
		<pubDate>Tue, 27 Dec 2011 13:45:39 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[OpenNLP]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Taming Text]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=460</guid>
		<description><![CDATA[Drew, Tom and I are feverishly working away on finishing up Taming Text.  We are currently in the process of addressing the feedback we got from our final review and should have updates up soon.  I have also posted all of the book&#8217;s source code up on Github under the Taming Text user.  The source includes, [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" title="Taming Text book cover" src="http://manning.com/ingersoll/ingersoll_cover150.jpg" alt="" width="150" height="188" /></p>
<p>Drew, Tom and I are feverishly working away on finishing up <a href="http://www.manning.com/affiliate/idevaffiliate.php?id=1069_148">Taming Text</a>.  We are currently in the process of addressing the feedback we got from our final review and should have updates up soon.  I have also posted all of the book&#8217;s source code up on Github under the <a href="http://www.github.com/tamingtext">Taming Text user</a>.  The source includes, amongst other things, a simple Question Answering system using Solr and OpenNLP, as well as analyzers for Lucene that use OpenNLP for sentence detection, part of speech tagging and Named Entity Recognition.  As with most books, these examples are meant to be just that, examples.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/12/27/taming-text-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lucid Imagination » Flexible ranking in Lucene 4</title>
		<link>http://lucene.grantingersoll.com/2011/09/12/lucid-imagination-%c2%bb-flexible-ranking-in-lucene-4/</link>
		<comments>http://lucene.grantingersoll.com/2011/09/12/lucid-imagination-%c2%bb-flexible-ranking-in-lucene-4/#comments</comments>
		<pubDate>Mon, 12 Sep 2011 21:28:15 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=450</guid>
		<description><![CDATA[For those who have wanted other scoring models in Lucene/Solr (Okapi, others) more details can be found on Lucid&#8217;s blog: Lucid Imagination » Flexible ranking in Lucene 4.]]></description>
			<content:encoded><![CDATA[<p>For those who have wanted other scoring models in Lucene/Solr (Okapi, others) more details can be found on Lucid&#8217;s blog: <a href="http://www.lucidimagination.com/blog/2011/09/12/flexible-ranking-in-lucene-4/">Lucid Imagination » Flexible ranking in Lucene 4</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/09/12/lucid-imagination-%c2%bb-flexible-ranking-in-lucene-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R in Action</title>
		<link>http://lucene.grantingersoll.com/2011/09/02/r-in-action/</link>
		<comments>http://lucene.grantingersoll.com/2011/09/02/r-in-action/#comments</comments>
		<pubDate>Fri, 02 Sep 2011 12:12:24 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=448</guid>
		<description><![CDATA[Just ordered &#8220;R in Action&#8221; from Manning.  Looking forward to learning more about it, as it comes up often when discussing solving smaller problems that what is appropriate for Apache Mahout.  Hopefully, I will have time to post a review in the coming weeks.]]></description>
			<content:encoded><![CDATA[<p>Just ordered &#8220;<a href="http://affiliate.manning.com/idevaffiliate.php?id=1069&amp;url=16">R in Action</a>&#8221; from Manning.  Looking forward to learning more about it, as it comes up often when discussing solving smaller problems that what is appropriate for <a href="http://mahout.apache.org">Apache Mahout</a>.  Hopefully, I will have time to post a review in the coming weeks.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/09/02/r-in-action/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Mahout and Other News</title>
		<link>http://lucene.grantingersoll.com/2011/08/05/mahout-and-other-news/</link>
		<comments>http://lucene.grantingersoll.com/2011/08/05/mahout-and-other-news/#comments</comments>
		<pubDate>Fri, 05 Aug 2011 20:41:35 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=438</guid>
		<description><![CDATA[After some time away, I&#8217;m happy to have had some time recently to work on Mahout again.  Lots of goodness all over the place happening there that I&#8217;ll leave to others to explain while I focus in on a few recent things I&#8217;ve been doing. First off, I was doing a fair amount of work [...]]]></description>
			<content:encoded><![CDATA[<p>After some time away, I&#8217;m happy to have had some time recently to work on Mahout again.  Lots of goodness all over the place happening there that I&#8217;ll leave to others to explain while I focus in on a few recent things I&#8217;ve been doing.</p>
<p>First off, I was doing a fair amount of work calculating document similarities across whole collections using, at first, the RowSimilarityJob and later a map-side simplification I wrote that uses the distributed cache called the VectorDistanceSimilarityJob.  Both of these come in handy when one wants to calculate pairwise-similarity between all (or most) items in a collection.  The original Mahout implementation was focused on providing recommendations, but as outlined in the <a href="http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf">Elsayed, Lin and Oard paper</a>, it is quite useful for text as well in cases where one wants to precompute &#8220;more like this&#8221; for all documents.  As for the need for two similar approaches, see the discussion at <a href="http://www.lucidimagination.com/search/document/40c4f124795c6b5/rowsimilarity_s#42ab816c27c6a9e7">http://www.lucidimagination.com/search/document/40c4f124795c6b5/rowsimilarity_s#42ab816c27c6a9e7</a>.  In essence, it boils down to I didn&#8217;t need a fully generic implementation that was a bit slower on larger matrices since I mainly wanted to compare all my vectors in HDFS against a subset of &#8220;core&#8221; vectors that fit into memory.  That being said, <a href="http://ssc.io/rowsimilarityjob-on-steroids/">Sebastian</a> is already hard at work on making the more generic version perform better when certain distance measures are used while still offering the full suite of capabilities of the existing RowSimilarityJob.  See <a href="https://issues.apache.org/jira/browse/MAHOUT-767">MAHOUT-767</a> for more info on that work.</p>
<p>Now, I&#8217;m looking into some more pruning techniques via <a href="https://issues.apache.org/jira/browse/MAHOUT-688">MAHOUT-688</a>.  After that quick patch, I think I&#8217;m going to dig in a bit more to recommendations as well as run some tests on the ASF mail archives I posted a while back (see below for an update).</p>
<p>Also, I&#8217;ve switched to using Git and Github for managing my Mahout changes (as well as other work), so if you want to see what I&#8217;m up to, <a href="https://github.com/gsingers/">check out my Github</a> account.</p>
<p>It&#8217;s not complete yet, but the ASF Public Mail archive I put up <a href="https://s3.amazonaws.com/asf-mail-archives/index.html">last September</a> on Amazon AWS is getting a fresh new version.  The interim solution is available at <a href="https://s3.amazonaws.com/asf-mail-archives-7-18-2011/index.html">https://s3.amazonaws.com/asf-mail-archives-7-18-2011/index.html</a>, but look for it to be a <a href="http://aws.amazon.com/datasets">Public Data Set</a> hosted by Amazon soon.  The September version of this data contained roughly 6.7M emails sent to the public mailing lists at the Apache Software Foundation, so I suspect this version has somewhere in the 7M+ item range, but I haven&#8217;t counted them.  At any rate, I hope it is useful to people.</p>
<p>Finally, on a personal note, I&#8217;m back at <a href="http://www.lucidimagination.com">Lucid Imagination</a> after a brief move elsewhere, this time in a new role as Chief Scientist.  Lucid is a company I co-founded and helped build up for the past 4 years.  I&#8217;m looking forward to be back working closely with Lucene and Solr again and a <a href="http://www.lucidimagination.com/why-lucid/leadership">top notch technical team</a>.  I&#8217;m also looking forward to working on Mahout more, as well as other technologies like Hadoop, Pig, HBase and the like, especially as they relate to search and recommendations.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/08/05/mahout-and-other-news/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Slides from DC Hadoop meetup</title>
		<link>http://lucene.grantingersoll.com/2011/05/03/slides-from-dc-hadoop-meetup/</link>
		<comments>http://lucene.grantingersoll.com/2011/05/03/slides-from-dc-hadoop-meetup/#comments</comments>
		<pubDate>Tue, 03 May 2011 14:11:59 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=430</guid>
		<description><![CDATA[The slides from my DC Hadoop meetup presentation are on SlideShare at: Intro to Mahout &#8212; DC Hadoop. I really enjoyed the meetup.  Lots of good questions and insights into machine learning.  For those at the meeting who were asking about references, check out Mahout&#8217;s references page, especially the Background Material section.]]></description>
			<content:encoded><![CDATA[<p>The slides from my DC Hadoop meetup presentation are on SlideShare at: <a href="http://www.slideshare.net/gsingers/intro-to-mahout-dc-hadoop">Intro to Mahout &#8212; DC Hadoop</a>.</p>
<p>I really enjoyed the meetup.  Lots of good questions and insights into machine learning.  For those at the meeting who were asking about references, check out Mahout&#8217;s <a href="https://cwiki.apache.org/confluence/display/MAHOUT/Books+Tutorials+and+Talks">references</a> page, especially the Background Material section.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/05/03/slides-from-dc-hadoop-meetup/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Stump the Chump</title>
		<link>http://lucene.grantingersoll.com/2011/04/24/stump-the-chump/</link>
		<comments>http://lucene.grantingersoll.com/2011/04/24/stump-the-chump/#comments</comments>
		<pubDate>Sun, 24 Apr 2011 06:09:39 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=428</guid>
		<description><![CDATA[I&#8217;m on the hot seat for &#8220;Stump the Chump&#8221; this year at Lucene Revolution, so if you have questions you want me to tackle, please either show up at my talk or email them to info@lucenerevolution.org.  See Session Abstracts &#124; Day 1 &#124; www.lucenerevolution.org for more information.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m on the hot seat for &#8220;Stump the Chump&#8221; this year at Lucene Revolution, so if you have questions you want me to tackle, please either show up at my talk or email them to info@lucenerevolution.org.  See <a href="http://lucenerevolution.org/2011/sessions-day-1#stump-ingersoll">Session Abstracts | Day 1 | www.lucenerevolution.org</a> for more information.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/04/24/stump-the-chump/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deploying a massively scalable recommender system with Apache Mahout &#124; “I for one welcome our new computer overlords”</title>
		<link>http://lucene.grantingersoll.com/2011/04/22/deploying-a-massively-scalable-recommender-system-with-apache-mahout%c2%a0%c2%a0%e2%80%9ci-for-one-welcome-our-new-computer-overlords%e2%80%9d/</link>
		<comments>http://lucene.grantingersoll.com/2011/04/22/deploying-a-massively-scalable-recommender-system-with-apache-mahout%c2%a0%c2%a0%e2%80%9ci-for-one-welcome-our-new-computer-overlords%e2%80%9d/#comments</comments>
		<pubDate>Fri, 22 Apr 2011 19:41:27 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=426</guid>
		<description><![CDATA[Excellent post by fellow Mahout committer Sebastian Schelter on deploying a large scale recommender with Mahout: Deploying a massively scalable recommender system with Apache Mahout &#124; “I for one welcome our new computer overlords”.]]></description>
			<content:encoded><![CDATA[<p>Excellent post by fellow Mahout committer Sebastian Schelter on deploying a large scale recommender with Mahout:</p>
<p><a href="http://ssc.io/deploying-a-massively-scalable-recommender-system-with-apache-mahout/">Deploying a massively scalable recommender system with Apache Mahout | “I for one welcome our new computer overlords”</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/04/22/deploying-a-massively-scalable-recommender-system-with-apache-mahout%c2%a0%c2%a0%e2%80%9ci-for-one-welcome-our-new-computer-overlords%e2%80%9d/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apache Lucene 3.1.0 and Apache Solr 3.1.0</title>
		<link>http://lucene.grantingersoll.com/2011/03/31/apache-lucene-3-1-0-and-apache-solr-3-1-0/</link>
		<comments>http://lucene.grantingersoll.com/2011/03/31/apache-lucene-3-1-0-and-apache-solr-3-1-0/#comments</comments>
		<pubDate>Thu, 31 Mar 2011 18:37:35 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=424</guid>
		<description><![CDATA[I just sent out to the mailing lists the official release announcements for Lucene and Solr 3.1.0 as well as posted over on the Lucid Imagination site the release announcement, etc. Lucid Imagination » Apache Lucene 3.1.0 and Apache Solr 3.1.0. Thanks to everyone for all their hard work in making these releases happen.]]></description>
			<content:encoded><![CDATA[<p>I just sent out to the mailing lists the official release announcements for Lucene and Solr 3.1.0 as well as posted over on the Lucid Imagination site the release announcement, etc.</p>
<p><a href="http://www.lucidimagination.com/blog/2011/03/31/apache-lucene-3-1-0-and-apache-solr-3-1-0/">Lucid Imagination » Apache Lucene 3.1.0 and Apache Solr 3.1.0</a>.</p>
<p>Thanks to everyone for all their hard work in making these releases happen.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/03/31/apache-lucene-3-1-0-and-apache-solr-3-1-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lucene, Solr and SXSW</title>
		<link>http://lucene.grantingersoll.com/2011/03/17/lucene-solr-and-sxsw/</link>
		<comments>http://lucene.grantingersoll.com/2011/03/17/lucene-solr-and-sxsw/#comments</comments>
		<pubDate>Thu, 17 Mar 2011 12:16:42 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[SXSW]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=418</guid>
		<description><![CDATA[Finally back in the saddle from SXSW, where a good time was had by all, AFAICT.  It was my first time, so it was a bit overwhelming.  So many people, so much hype, so much to do and see.  Suffice it to say, most of the hype was about social media.  Can you say Bubble?  [...]]]></description>
			<content:encoded><![CDATA[<p>Finally back in the saddle from <a href="http://www.sxsw.com">SXSW</a>, where a good time was had by all, AFAICT.  It was my first time, so it was a bit overwhelming.  So many people, so much hype, so much to do and see.  Suffice it to say, most of the hype was about social media.  Can you say Bubble?  Sure there are some winners in the bunch, but there&#8217;s a whole lot of people whose only idea is simply to copy some other venture funded company who has yet to make a profit or to create Yet Another Social Media Monitoring Solution (YAMMS).  From the engineering side of these companies, this stuff is incredibly cool and I totally get the excitement.  What&#8217;s not to like about working on massively scalable problems with incredible update rates using cool new technology like <a href="http://hadoop.apache.org">Hadoop</a>, <a href="http://cassandra.apache.org">Cassandra</a>, <a href="http://mahout.apache.org">Mahout</a>, etc?  From the business side, color me cynical.</p>
<p>Regardless of the hype, I had a good time.  Obviously I&#8217;m biased, but I think the talk RC Johnson and I gave on using Solr in NoSQL situations went well, was well attended and <a href="http://twitter.com/#search?q=%23solrnosql">well received</a>.  RC and I split duties during the talk.  I covered the big picture around Solr as a NoSQL solution and RC covered how <a href="http://www.bazaarvoice.com/">Bazaarvoice</a> uses Solr to serve billions of searches/lookups per month.  A good chunk of the audience was already familiar with NoSQL and Lucene/Solr, so it was a pretty technical audience, which is exactly what we hoped for.   We also had some good questions during the talk as well as afterward.  The <a href="http://portal.sliderocket.com/ANYSX/SXSW-2011-Solr-Nosql">slides are up on SlideRocket</a>.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/03/17/lucene-solr-and-sxsw/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

