<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Grant's Grunts: Lucene Edition &#187; Open Relevance</title>
	<atom:link href="http://lucene.grantingersoll.com/category/open-relevance/feed/" rel="self" type="application/rss+xml" />
	<link>http://lucene.grantingersoll.com</link>
	<description>Thoughts on Apache Lucene, Mahout, Solr, Tika and Nutch</description>
	<lastBuildDate>Mon, 06 Feb 2012 12:07:52 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Join the Lucene Revolution in Boston October 2010 &#124; www.lucenerevolution.org</title>
		<link>http://lucene.grantingersoll.com/2010/05/17/join-the-lucene-revolution-in-boston-october-2010-www-lucenerevolution-org/</link>
		<comments>http://lucene.grantingersoll.com/2010/05/17/join-the-lucene-revolution-in-boston-october-2010-www-lucenerevolution-org/#comments</comments>
		<pubDate>Mon, 17 May 2010 12:45:38 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Lucene Connector Framework]]></category>
		<category><![CDATA[Lucid Imagination]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Nutch]]></category>
		<category><![CDATA[Open Relevance]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Real Time Search]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[spatial]]></category>
		<category><![CDATA[Tika]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=368</guid>
		<description><![CDATA[Join the Lucene Revolution in Boston October 2010 &#124; www.lucenerevolution.org. Hope to see you in Boston!]]></description>
			<content:encoded><![CDATA[<p><a href="http://lucenerevolution.com/">Join the Lucene Revolution in Boston October 2010 | www.lucenerevolution.org</a>.</p>
<p>Hope to see you in Boston!</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2010/05/17/join-the-lucene-revolution-in-boston-october-2010-www-lucenerevolution-org/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SFBay Apache Lucene/Solr Meetup Jan 21st.</title>
		<link>http://lucene.grantingersoll.com/2010/01/12/sfbay-apache-lucenesolr-meetup-jan-21st/</link>
		<comments>http://lucene.grantingersoll.com/2010/01/12/sfbay-apache-lucenesolr-meetup-jan-21st/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 18:44:02 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Nutch]]></category>
		<category><![CDATA[Open Relevance]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Tika]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=317</guid>
		<description><![CDATA[Details and RSVP at: SFBay Apache Lucene/Solr Meetup San Mateo, CA &#8211; Meetup.com.]]></description>
			<content:encoded><![CDATA[<p>Details and RSVP at: <a href="http://www.meetup.com/SFBay-Lucene-Solr-Meetup/">SFBay Apache Lucene/Solr Meetup San Mateo, CA &#8211; Meetup.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2010/01/12/sfbay-apache-lucenesolr-meetup-jan-21st/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Open Relevance Project website is now up</title>
		<link>http://lucene.grantingersoll.com/2009/06/25/the-open-relevance-project-website-is-now-up/</link>
		<comments>http://lucene.grantingersoll.com/2009/06/25/the-open-relevance-project-website-is-now-up/#comments</comments>
		<pubDate>Thu, 25 Jun 2009 21:51:57 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Open Relevance]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=204</guid>
		<description><![CDATA[Welcome to the Open Relevance Project!. I finally got around to putting the Open Relevance Project website up.  Click the link above to check it out.  Be forewarned, it is barebones.  Patches welcome, although I suspect most of the work will be on the Wiki once the ASF infrastructure gets that setup.]]></description>
			<content:encoded><![CDATA[<p><a href="http://lucene.apache.org/openrelevance/">Welcome to the Open Relevance Project!</a>.</p>
<p>I finally got around to putting the Open Relevance Project website up.  Click the link above to check it out.  Be forewarned, it is barebones.  Patches welcome, although I suspect most of the work will be on the Wiki once the ASF infrastructure gets that setup.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2009/06/25/the-open-relevance-project-website-is-now-up/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SF Bay Area Lucene/Solr Meetup</title>
		<link>http://lucene.grantingersoll.com/2009/06/04/sf-bay-area-lucenesolr-meetup/</link>
		<comments>http://lucene.grantingersoll.com/2009/06/04/sf-bay-area-lucenesolr-meetup/#comments</comments>
		<pubDate>Thu, 04 Jun 2009 17:49:35 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[canopy clustering]]></category>
		<category><![CDATA[Droids]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Latent Dirichlet Allocation]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Lucid Imagination]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Open Relevance]]></category>
		<category><![CDATA[Real Time Search]]></category>
		<category><![CDATA[relevance]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Tika]]></category>
		<category><![CDATA[Meetup]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=197</guid>
		<description><![CDATA[Just wanted to follow up on last night&#8217;s Lucene/Solr Meetup in San Francisco. First off, special thanks to all the speakers (Jason Rutherglen, Michael Busch, Erik Hatcher and all the lightning talks.)  We had a lot of excellent talks ranging from low level Lucene details on payloads and real time search to high level discussions [...]]]></description>
			<content:encoded><![CDATA[<p>Just wanted to follow up on last night&#8217;s Lucene/Solr <a href="http://www.meetup.com/SFBay-Lucene-Solr-Meetup/">Meetup</a> in San Francisco.</p>
<p>First off, special thanks to all the speakers (Jason Rutherglen, Michael Busch, Erik Hatcher and all the lightning talks.)  We had a lot of excellent talks ranging from low level Lucene details on payloads and real time search to high level discussions on new feature in Solr and best practices for working on stopwords and relevance.  Also had intros to <a href="http://lucene.apache.org/mahout">Mahout</a>, <a href="http://lucene.apache.org/tika">Tika</a> and the new <a href="http://www.lucidimagination.com/search/document/84205d273f3753c2/open_relevance_project_kickoff">Open Relevance</a> project at Lucene.  I&#8217;ll post the slides on the Meetup site when they are available (I am still waiting to get them from the speakers.)</p>
<p>Second, I really enjoyed engaging with so many people about what they are working on in Lucene/Solr.  It is always fun to hear all the different ways people are (ab)using Lucene/Solr to do cool things, etc.   It was especially good to meet some fellow Mahout committers (Ted Dunning and Jeff Eastman) for the first time, as well as one of Mahout&#8217;s Google Summer of Code student David Hall, who is working on adding <a href="http://www.lucidimagination.com/search/?q=Latent+Dirichlet">Latent Dirichlet Allocation</a>.</p>
<p>Finally, I look forward to doing more of these.  Right now, I&#8217;m looking for interest in Raleigh, NC, but I know we&#8217;ll likely have another one in the Bay Area again soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2009/06/04/sf-bay-area-lucenesolr-meetup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Copying TREC is the Wrong Track for the Enterprise &#124; The Noisy Channel</title>
		<link>http://lucene.grantingersoll.com/2009/05/18/copying-trec-is-the-wrong-track-for-the-enterprise-the-noisy-channel/</link>
		<comments>http://lucene.grantingersoll.com/2009/05/18/copying-trec-is-the-wrong-track-for-the-enterprise-the-noisy-channel/#comments</comments>
		<pubDate>Tue, 19 May 2009 03:22:50 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Open Relevance]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[relevance]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=191</guid>
		<description><![CDATA[Copying TREC is the Wrong Track for the Enterprise &#124; The Noisy Channel. Daniel Tunkelang has written up an interesting post on the new Open Relevance Project that me and a few other Lucene people are starting up and I thought I would respond here: Little late to the conversation, but I think maybe we [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://thenoisychannel.com/2009/05/18/copying-trec-is-the-wrong-track-for-the-enterprise/">Copying TREC is the Wrong Track for the Enterprise | The Noisy Channel</a>.</p>
<p>Daniel Tunkelang has written up an interesting post on the new <a href="http://wiki.apache.org/lucene-java/OpenRelevance">Open Relevance Project</a> that me and a few other Lucene people are starting up and I thought I would respond here:</p>
<blockquote><p>Little late to the conversation, but I think maybe we should back up a little bit.   I like a lot of the comments and wish they were actually made on general@lucene.apache.org where we are discussing the merits of the undertaking (see <a href="http://www.lucidimagination.com/search/document/76d7cdeed4882397">http://www.lucidimagination.com/search/document/76d7cdeed4882397</a>)  not that I expect that to happen given the way blogs work. At any rate, I&#8217;d like to add my two cents as the one who started the thread on general@lucene.apache.org.</p>
<p>First off, the ORP is <span style="text-decoration: underline;"><strong>VERY</strong></span> early stage brainstorming.  ORP really doesn&#8217;t warrant much attention at this point and it is premature to even speculate about how it relates to TREC, Google, Yahoo!, Microsoft or anything else.   I&#8217;m not even sure it has enough support to be a viable Lucene subproject! For now, I think most of us who are actually working on the genesis of the project are merely looking for a means to improve Lucene (and also Solr, Nutch and Mahout), despite what Otis says in his blog post about having grander notions for comparing across engines.</p>
<p>So, to the background&#8230;</p>
<p>This (ORP) is something I&#8217;ve been thinking about for a long time now and have discussed with a number of people in the past.    The motivation comes from my frustration over the years in not being able to obtain data that everyone on Lucene can use without limitations, since I&#8217;ve almost always worked in places that had little money to spend on this kind of thing.<br />
See <a href="http://www.lucidimagination.com/search/document/656d5ca50c8c9242">http://www.lucidimagination.com/search/document/656d5ca50c8c9242</a>, <a href="http://lucene.grantingersoll.com/category/trec/">http://lucene.grantingersoll.com/category/trec/</a> and <a href="http://lucene.grantingersoll.com/2008/09/18/opening-up-academic-research-on-ir-and-machine-learning/">http://lucene.grantingersoll.com/2008/09/18/opening-up-academic-research-on-ir-and-machine-learning/</a> for background.  The second motivation is simply to have practical, real world data driven by actual users.</p>
<p>In the past, I have talked with both NIST and Sheffield to try to work out terms by which the Lucene community could obtain TREC resources, but the licensing terms simply prevent a totally free redistribution.  (BTW, this is not NIST/Sheffield&#8217;s fault, but the company that allows them to use the data.  NIST/Sheffield are doing the best they can given their constraints.)  I have also talked with a few commercial companies that redistribute data (blogs, etc.) all to no avail (it&#8217;s usually the copyright that kills it.)  If the ASF were to buy the dataset, we could distribute to the committers on Lucene according to the licensing terms, but not to the broader community and we&#8217;d have to maintain a list of who has it, etc.  See the terms <a href="http://ir.dcs.gla.ac.uk/test_collections/">here</a>, for instance.  Since many of the best ideas come from the community in Open Source and you never know when and where they come from, I deemed this unacceptable and decided not to pursue it even though the ASF authorized me to go forward with it (i.e. spend the money) if I wanted to.  After all, it is only a few hundred bucks.</p>
<p>To me, it is vital that there be an open and <span style="text-decoration: underline;"><strong>FREE</strong></span> means for doing relevance tests that the Lucene community can use to improve itself.  If others can benefit, so be it.  Much like Lucene developed a benchmarking tool for people to share performance tests (both speed and relevance) in a straightforward way (see the contrib/benchmark section of the Lucene distribution), so to is there a need for us (speaking, unofficially, for Lucene) to talk about relevance in a public way so we can compare notes just as any two researchers buried in the bowels of a commercial company might compare notes.   Many, many people have used Lucene to do TREC (in fact, I have), but it is a showstopper when the other person you are discussing relevance with can&#8217;t just pick up the exact same bits (corpus, queries, judgments) and run the exact same tests.  In other words, the goal is not to compare competing offerings, IMO, (although it will likely happen b/c that is human nature) it is to give Lucene users a common way of evaluating and talking about relevance.</p>
<p>As anyone familiar with Lucene knows, ORP will be driven by the people that show up and volunteer to contribute to it, as are all Apache projects.  Thus, the slate really is clean.  If anyone (and I truly mean anyone, not just Lucene users, even though that is the preliminary focus) is interested, please show up and discuss over at general@lucene.apache.org.   We&#8217;d welcome the ideas and, moreover, any efforts.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2009/05/18/copying-trec-is-the-wrong-track-for-the-enterprise-the-noisy-channel/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

