<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Apache Mahout Status</title>
	<atom:link href="http://lucene.grantingersoll.com/2009/06/16/apache-mahout-status/feed/" rel="self" type="application/rss+xml" />
	<link>http://lucene.grantingersoll.com/2009/06/16/apache-mahout-status/</link>
	<description>Thoughts on Apache Lucene, Mahout, Solr, Tika and Nutch</description>
	<lastBuildDate>Sat, 10 Sep 2011 20:15:34 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Inductive Bias &#187; GSoC at Mahout</title>
		<link>http://lucene.grantingersoll.com/2009/06/16/apache-mahout-status/comment-page-1/#comment-6991</link>
		<dc:creator>Inductive Bias &#187; GSoC at Mahout</dc:creator>
		<pubDate>Wed, 09 Sep 2009 22:22:18 +0000</pubDate>
		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=202#comment-6991</guid>
		<description>[...] Apart from three new additions to the code base, summer also brought quite some traffic to the user list - not only in terms of subscriptions but also in terms of developers contributing to the discussions online. Currently, it looks like the project is really gaining momentum, as also noted in Grant Ingersoll&#8217;s post. [...]</description>
		<content:encoded><![CDATA[<p>[...] Apart from three new additions to the code base, summer also brought quite some traffic to the user list &#8211; not only in terms of subscriptions but also in terms of developers contributing to the discussions online. Currently, it looks like the project is really gaining momentum, as also noted in Grant Ingersoll&#8217;s post. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Apache Mahout nimmt Fahrt auf &#8211; Meyer Information Management Blog</title>
		<link>http://lucene.grantingersoll.com/2009/06/16/apache-mahout-status/comment-page-1/#comment-6988</link>
		<dc:creator>Apache Mahout nimmt Fahrt auf &#8211; Meyer Information Management Blog</dc:creator>
		<pubDate>Mon, 07 Sep 2009 10:15:45 +0000</pubDate>
		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=202#comment-6988</guid>
		<description>[...] nur in der Mailingliste (Apache Mahout Status by Grant Ingersoll) des Projekts nimmt der Traffic zu, auch erste Projekte die Mahout erfolgreich einsetzen werden [...]</description>
		<content:encoded><![CDATA[<p>[...] nur in der Mailingliste (Apache Mahout Status by Grant Ingersoll) des Projekts nimmt der Traffic zu, auch erste Projekte die Mahout erfolgreich einsetzen werden [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: grant_ingersoll</title>
		<link>http://lucene.grantingersoll.com/2009/06/16/apache-mahout-status/comment-page-1/#comment-6747</link>
		<dc:creator>grant_ingersoll</dc:creator>
		<pubDate>Wed, 08 Jul 2009 02:52:21 +0000</pubDate>
		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=202#comment-6747</guid>
		<description>Yeah, I have already learned a lot from David!  I&#039;m often in the other boat: I&#039;ve implemented a lot of the ideas, but don&#039;t always know the theory.  Mahout has been a great learning experience for me already.  Hopefully, I can pass on what I know about communities and open source.  In fact, learning more about ML was precisely one of my goals in starting the project.  My main NLP background was in rule-based systems, but it was pretty obvious to me that ML approaches were the growing trend, so Mahout helps me learn and apply.</description>
		<content:encoded><![CDATA[<p>Yeah, I have already learned a lot from David!  I&#8217;m often in the other boat: I&#8217;ve implemented a lot of the ideas, but don&#8217;t always know the theory.  Mahout has been a great learning experience for me already.  Hopefully, I can pass on what I know about communities and open source.  In fact, learning more about ML was precisely one of my goals in starting the project.  My main NLP background was in rule-based systems, but it was pretty obvious to me that ML approaches were the growing trend, so Mahout helps me learn and apply.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob Carpenter</title>
		<link>http://lucene.grantingersoll.com/2009/06/16/apache-mahout-status/comment-page-1/#comment-6741</link>
		<dc:creator>Bob Carpenter</dc:creator>
		<pubDate>Tue, 07 Jul 2009 19:47:55 +0000</pubDate>
		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=202#comment-6741</guid>
		<description>Who&#039;s mentoring whom?  

Picking up a volunteer student like David Hall&#039;s awesome -- I loved the paper that came out of his undergrad thesis on applying LDA to the ACL Anthology corpus.

There are two aspects to scaling LDA -- number of documents, and number of topics.  The documents can be parallelized to some extent depending on the algorithm.  For scaling number of topics in a sampling context, check out: 

I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, M. Welling. &quot;Fast Collapsed Gibbs Sampling for Latent Dirichlet Allocation.&quot; ACM Knowledge Discovery and Data Mining (KDD), 2008. 
http://www.ics.uci.edu/~asuncion/pubs/KDD_08.pdf

I had this experience like David&#039;s likely to get when I went to SpeechWorks -- I&#039;d been a professor who released software and I knew algorithms very well, I just hadn&#039;t had practical professional programming experience.  I learned an incredible amount from cowokers who would&#039;ve been my students had they bothered to go to grad school.</description>
		<content:encoded><![CDATA[<p>Who&#8217;s mentoring whom?  </p>
<p>Picking up a volunteer student like David Hall&#8217;s awesome &#8212; I loved the paper that came out of his undergrad thesis on applying LDA to the ACL Anthology corpus.</p>
<p>There are two aspects to scaling LDA &#8212; number of documents, and number of topics.  The documents can be parallelized to some extent depending on the algorithm.  For scaling number of topics in a sampling context, check out: </p>
<p>I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, M. Welling. &#8220;Fast Collapsed Gibbs Sampling for Latent Dirichlet Allocation.&#8221; ACM Knowledge Discovery and Data Mining (KDD), 2008.<br />
<a href="http://www.ics.uci.edu/~asuncion/pubs/KDD_08.pdf" rel="nofollow">http://www.ics.uci.edu/~asuncion/pubs/KDD_08.pdf</a></p>
<p>I had this experience like David&#8217;s likely to get when I went to SpeechWorks &#8212; I&#8217;d been a professor who released software and I knew algorithms very well, I just hadn&#8217;t had practical professional programming experience.  I learned an incredible amount from cowokers who would&#8217;ve been my students had they bothered to go to grad school.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

