<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Grant's Grunts: Lucene Edition &#187; Mahout</title>
	<atom:link href="http://lucene.grantingersoll.com/category/mahout/feed/" rel="self" type="application/rss+xml" />
	<link>http://lucene.grantingersoll.com</link>
	<description>Thoughts on Apache Lucene, Mahout, Solr, Tika and Nutch</description>
	<lastBuildDate>Mon, 06 Feb 2012 12:07:52 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Berlin Buzzwords 2012</title>
		<link>http://lucene.grantingersoll.com/2012/01/18/berlin-buzzwords-2012/</link>
		<comments>http://lucene.grantingersoll.com/2012/01/18/berlin-buzzwords-2012/#comments</comments>
		<pubDate>Wed, 18 Jan 2012 13:33:40 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=467</guid>
		<description><![CDATA[In case you haven&#8217;t heard, and are in Europe this June (or want to be), you should check out the Berlin Buzzwords conference.  It&#8217;s a great conference for all things related to Lucene, Solr, Hadoop, Mahout, NoSQL and generally scaling.  The CFP is open now through March 11.]]></description>
			<content:encoded><![CDATA[<p>In case you haven&#8217;t heard, and are in Europe this June (or want to be), you should check out the <a href="http://www.berlinbuzzwords.de">Berlin Buzzwords</a> conference.  It&#8217;s a great conference for all things related to Lucene, Solr, Hadoop, Mahout, NoSQL and generally scaling.  The CFP is open now through March 11.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2012/01/18/berlin-buzzwords-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mahout in Action Review</title>
		<link>http://lucene.grantingersoll.com/2011/10/15/mahout-in-action-review/</link>
		<comments>http://lucene.grantingersoll.com/2011/10/15/mahout-in-action-review/#comments</comments>
		<pubDate>Sat, 15 Oct 2011 16:58:47 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Mahout]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=458</guid>
		<description><![CDATA[&#160; &#160; &#160; I&#8217;ve posted my review of &#8220;Mahout in Action&#8221; on Lucid&#8217;s website: Mahout in Action Review.]]></description>
			<content:encoded><![CDATA[<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>I&#8217;ve posted my review of &#8220;<a href="http://www.manning.com/affiliate/idevaffiliate.php?id=1069_219">Mahout in Action</a>&#8221; on Lucid&#8217;s website: <a href="http://www.lucidimagination.com/blog/2011/10/15/mahout-in-action-review/">Mahout in Action Review</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/10/15/mahout-in-action-review/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SXSW 2012 &#8211; Apache Mahout: Bringing Intelligence to Your App</title>
		<link>http://lucene.grantingersoll.com/2011/08/15/sxsw-2012-apache-mahout-bringing-intelligence-to-your-app/</link>
		<comments>http://lucene.grantingersoll.com/2011/08/15/sxsw-2012-apache-mahout-bringing-intelligence-to-your-app/#comments</comments>
		<pubDate>Mon, 15 Aug 2011 19:43:38 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Map Reduce]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=441</guid>
		<description><![CDATA[It&#8217;s that time of year again: time to vote for SXSW talks.  Last year I did a talk with RC Johnson of BazaarVoice on Solr as NoSQL, this year I thought I would try to fly solo and submitted a talk on Apache Mahout. So, if you are so inclined to do the whole crowdsourcing [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright" title="SXSW Panel Picker" src="http://panelpicker.sxsw.com/img/sxsw/my_SXSW_idea_2012.png" alt="" width="200" height="120" />It&#8217;s that time of year again: time to vote for <a href="http://www.sxsw.com">SXSW</a> talks.  Last year I did a talk with RC Johnson of <a href="http://www.bazaarvoice.com">BazaarVoice</a> on Solr as NoSQL, this year I thought I would try to fly solo and submitted a talk on <a href="http://mahout.apache.org">Apache Mahout</a>.</p>
<p>So, if you are so inclined to do the whole crowdsourcing thing, please go vote for my talk at <a href="http://panelpicker.sxsw.com/ideas/view/9001">SXSW 2012 &#8211; Apache Mahout: Bringing Intelligence to Your App</a> and then maybe I will see you at SXSW in 2012.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/08/15/sxsw-2012-apache-mahout-bringing-intelligence-to-your-app/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mahout and Other News</title>
		<link>http://lucene.grantingersoll.com/2011/08/05/mahout-and-other-news/</link>
		<comments>http://lucene.grantingersoll.com/2011/08/05/mahout-and-other-news/#comments</comments>
		<pubDate>Fri, 05 Aug 2011 20:41:35 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=438</guid>
		<description><![CDATA[After some time away, I&#8217;m happy to have had some time recently to work on Mahout again.  Lots of goodness all over the place happening there that I&#8217;ll leave to others to explain while I focus in on a few recent things I&#8217;ve been doing. First off, I was doing a fair amount of work [...]]]></description>
			<content:encoded><![CDATA[<p>After some time away, I&#8217;m happy to have had some time recently to work on Mahout again.  Lots of goodness all over the place happening there that I&#8217;ll leave to others to explain while I focus in on a few recent things I&#8217;ve been doing.</p>
<p>First off, I was doing a fair amount of work calculating document similarities across whole collections using, at first, the RowSimilarityJob and later a map-side simplification I wrote that uses the distributed cache called the VectorDistanceSimilarityJob.  Both of these come in handy when one wants to calculate pairwise-similarity between all (or most) items in a collection.  The original Mahout implementation was focused on providing recommendations, but as outlined in the <a href="http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf">Elsayed, Lin and Oard paper</a>, it is quite useful for text as well in cases where one wants to precompute &#8220;more like this&#8221; for all documents.  As for the need for two similar approaches, see the discussion at <a href="http://www.lucidimagination.com/search/document/40c4f124795c6b5/rowsimilarity_s#42ab816c27c6a9e7">http://www.lucidimagination.com/search/document/40c4f124795c6b5/rowsimilarity_s#42ab816c27c6a9e7</a>.  In essence, it boils down to I didn&#8217;t need a fully generic implementation that was a bit slower on larger matrices since I mainly wanted to compare all my vectors in HDFS against a subset of &#8220;core&#8221; vectors that fit into memory.  That being said, <a href="http://ssc.io/rowsimilarityjob-on-steroids/">Sebastian</a> is already hard at work on making the more generic version perform better when certain distance measures are used while still offering the full suite of capabilities of the existing RowSimilarityJob.  See <a href="https://issues.apache.org/jira/browse/MAHOUT-767">MAHOUT-767</a> for more info on that work.</p>
<p>Now, I&#8217;m looking into some more pruning techniques via <a href="https://issues.apache.org/jira/browse/MAHOUT-688">MAHOUT-688</a>.  After that quick patch, I think I&#8217;m going to dig in a bit more to recommendations as well as run some tests on the ASF mail archives I posted a while back (see below for an update).</p>
<p>Also, I&#8217;ve switched to using Git and Github for managing my Mahout changes (as well as other work), so if you want to see what I&#8217;m up to, <a href="https://github.com/gsingers/">check out my Github</a> account.</p>
<p>It&#8217;s not complete yet, but the ASF Public Mail archive I put up <a href="https://s3.amazonaws.com/asf-mail-archives/index.html">last September</a> on Amazon AWS is getting a fresh new version.  The interim solution is available at <a href="https://s3.amazonaws.com/asf-mail-archives-7-18-2011/index.html">https://s3.amazonaws.com/asf-mail-archives-7-18-2011/index.html</a>, but look for it to be a <a href="http://aws.amazon.com/datasets">Public Data Set</a> hosted by Amazon soon.  The September version of this data contained roughly 6.7M emails sent to the public mailing lists at the Apache Software Foundation, so I suspect this version has somewhere in the 7M+ item range, but I haven&#8217;t counted them.  At any rate, I hope it is useful to people.</p>
<p>Finally, on a personal note, I&#8217;m back at <a href="http://www.lucidimagination.com">Lucid Imagination</a> after a brief move elsewhere, this time in a new role as Chief Scientist.  Lucid is a company I co-founded and helped build up for the past 4 years.  I&#8217;m looking forward to be back working closely with Lucene and Solr again and a <a href="http://www.lucidimagination.com/why-lucid/leadership">top notch technical team</a>.  I&#8217;m also looking forward to working on Mahout more, as well as other technologies like Hadoop, Pig, HBase and the like, especially as they relate to search and recommendations.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/08/05/mahout-and-other-news/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mahout Job Trends</title>
		<link>http://lucene.grantingersoll.com/2011/05/18/mahout-job-trends/</link>
		<comments>http://lucene.grantingersoll.com/2011/05/18/mahout-job-trends/#comments</comments>
		<pubDate>Wed, 18 May 2011 15:05:47 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Mahout]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=422</guid>
		<description><![CDATA[Yonik pointed me at: &#160; mahout Job Trends Mahout jobs While the scale is still pretty small (compare w/ iPhone at 0.1 on the vertical axis), at least it&#8217;s going upwards!  Of course, the keyword match for the search also assumes that we are talking Apache Mahout here and that no one is advertising for [...]]]></description>
			<content:encoded><![CDATA[<p>Yonik pointed me at:</p>
<div style="width: 540px;"><a title="mahout Job Trends" href="http://www.indeed.com/jobtrends?q=mahout"><br />
<img src="http://www.indeed.com/trendgraph/jobgraph.png?q=mahout" border="0" alt="mahout Job Trends graph" width="540" height="300" /><br />
</a>&nbsp;</p>
<table style="font-size: 80%;" border="0" cellspacing="0" cellpadding="6" width="100%">
<tbody>
<tr>
<td><a href="http://www.indeed.com/jobtrends?q=mahout">mahout Job Trends</a></td>
<td align="right"><a href="http://www.indeed.com/q-Mahout-jobs.html">Mahout jobs</a></td>
</tr>
</tbody>
</table>
</div>
<p>While the scale is still pretty small (compare w/ iPhone at 0.1 on the vertical axis), at least it&#8217;s going upwards!  Of course, the keyword match for the search also assumes that we are talking Apache Mahout here and that no one is advertising for people to take care of Elephants!</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/05/18/mahout-job-trends/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building a recommendation engine, foursquare style &#124; Foursquare Engineering Blog</title>
		<link>http://lucene.grantingersoll.com/2011/03/23/building-a-recommendation-engine-foursquare-style-foursquare-engineering-blog/</link>
		<comments>http://lucene.grantingersoll.com/2011/03/23/building-a-recommendation-engine-foursquare-style-foursquare-engineering-blog/#comments</comments>
		<pubDate>Wed, 23 Mar 2011 12:37:48 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Mahout]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=420</guid>
		<description><![CDATA[Building a recommendation engine, foursquare style &#124; Foursquare Engineering Blog. Nice write up on how Foursquare built a recommendation engine and a little bit on how they are using Mahout!]]></description>
			<content:encoded><![CDATA[<p><a href="http://engineering.foursquare.com/2011/03/22/building-a-recommendation-engine-foursquare-style/">Building a recommendation engine, foursquare style | Foursquare Engineering Blog</a>.</p>
<p>Nice write up on how Foursquare built a recommendation engine and a little bit on how they are using <a href="http://mahout.apache.org">Mahout</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/03/23/building-a-recommendation-engine-foursquare-style-foursquare-engineering-blog/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lucene, Solr and SXSW</title>
		<link>http://lucene.grantingersoll.com/2011/03/17/lucene-solr-and-sxsw/</link>
		<comments>http://lucene.grantingersoll.com/2011/03/17/lucene-solr-and-sxsw/#comments</comments>
		<pubDate>Thu, 17 Mar 2011 12:16:42 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[SXSW]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=418</guid>
		<description><![CDATA[Finally back in the saddle from SXSW, where a good time was had by all, AFAICT.  It was my first time, so it was a bit overwhelming.  So many people, so much hype, so much to do and see.  Suffice it to say, most of the hype was about social media.  Can you say Bubble?  [...]]]></description>
			<content:encoded><![CDATA[<p>Finally back in the saddle from <a href="http://www.sxsw.com">SXSW</a>, where a good time was had by all, AFAICT.  It was my first time, so it was a bit overwhelming.  So many people, so much hype, so much to do and see.  Suffice it to say, most of the hype was about social media.  Can you say Bubble?  Sure there are some winners in the bunch, but there&#8217;s a whole lot of people whose only idea is simply to copy some other venture funded company who has yet to make a profit or to create Yet Another Social Media Monitoring Solution (YAMMS).  From the engineering side of these companies, this stuff is incredibly cool and I totally get the excitement.  What&#8217;s not to like about working on massively scalable problems with incredible update rates using cool new technology like <a href="http://hadoop.apache.org">Hadoop</a>, <a href="http://cassandra.apache.org">Cassandra</a>, <a href="http://mahout.apache.org">Mahout</a>, etc?  From the business side, color me cynical.</p>
<p>Regardless of the hype, I had a good time.  Obviously I&#8217;m biased, but I think the talk RC Johnson and I gave on using Solr in NoSQL situations went well, was well attended and <a href="http://twitter.com/#search?q=%23solrnosql">well received</a>.  RC and I split duties during the talk.  I covered the big picture around Solr as a NoSQL solution and RC covered how <a href="http://www.bazaarvoice.com/">Bazaarvoice</a> uses Solr to serve billions of searches/lookups per month.  A good chunk of the audience was already familiar with NoSQL and Lucene/Solr, so it was a pretty technical audience, which is exactly what we hoped for.   We also had some good questions during the talk as well as afterward.  The <a href="http://portal.sliderocket.com/ANYSX/SXSW-2011-Solr-Nosql">slides are up on SlideRocket</a>.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/03/17/lucene-solr-and-sxsw/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>February 2011 TriHUG slides posted</title>
		<link>http://lucene.grantingersoll.com/2011/02/02/february-2011-trihug-slides-posted/</link>
		<comments>http://lucene.grantingersoll.com/2011/02/02/february-2011-trihug-slides-posted/#comments</comments>
		<pubDate>Wed, 02 Feb 2011 23:44:11 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[Mahout]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=413</guid>
		<description><![CDATA[I put my slides up from last night&#8217;s TriHUG talk on Apache Mahout on my slideshare account.]]></description>
			<content:encoded><![CDATA[<p>I put my slides up from last night&#8217;s <a href="http://www.trihug.org">TriHUG</a> talk on Apache Mahout on my <a href="http://www.slideshare.net/gsingers/apache-mahout-driving-the-yellow-elephant">slideshare</a> account.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/02/02/february-2011-trihug-slides-posted/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TriHUG: Apache Mahout</title>
		<link>http://lucene.grantingersoll.com/2011/01/31/trihug-apache-mahout/</link>
		<comments>http://lucene.grantingersoll.com/2011/01/31/trihug-apache-mahout/#comments</comments>
		<pubDate>Mon, 31 Jan 2011 13:04:26 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Raleigh]]></category>
		<category><![CDATA[Triangle]]></category>
		<category><![CDATA[TriHUG]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=411</guid>
		<description><![CDATA[I will be speaking tomorrow night at the Triangle Hadoop Users Group on Apache Mahout, hosted by Bronto Software.  This will be an introduction to both Mahout and machine learning, but we will also look at how Mahout uses Hadoop in a particular algorithm.  To learn more, see Triangle Hadoop Users Group.]]></description>
			<content:encoded><![CDATA[<p>I will be speaking tomorrow night at the Triangle Hadoop Users Group on Apache Mahout, hosted by Bronto Software.  This will be an introduction to both Mahout and machine learning, but we will also look at how Mahout uses Hadoop in a particular algorithm.  To learn more, see <a href="http://www.trihug.org/">Triangle Hadoop Users Group</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2011/01/31/trihug-apache-mahout/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lucid Imagination » Apache Mahout 0.4 Released</title>
		<link>http://lucene.grantingersoll.com/2010/10/31/lucid-imagination-%c2%bb-apache-mahout-0-4-released/</link>
		<comments>http://lucene.grantingersoll.com/2010/10/31/lucid-imagination-%c2%bb-apache-mahout-0-4-released/#comments</comments>
		<pubDate>Sun, 31 Oct 2010 21:32:06 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Mahout]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=401</guid>
		<description><![CDATA[Lucid Imagination » Apache Mahout 0.4 Released. Lots of exciting new things happening in Mahout!]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.lucidimagination.com/blog/2010/10/31/apache-mahout-0-4-released/">Lucid Imagination » Apache Mahout 0.4 Released</a>.</p>
<p>Lots of exciting new things happening in Mahout!</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2010/10/31/lucid-imagination-%c2%bb-apache-mahout-0-4-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

