<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Grant's Grunts: Lucene Edition &#187; spell checking</title>
	<atom:link href="http://lucene.grantingersoll.com/category/spell-checking/feed/" rel="self" type="application/rss+xml" />
	<link>http://lucene.grantingersoll.com</link>
	<description>Thoughts on Apache Lucene, Mahout, Solr, Tika and Nutch</description>
	<lastBuildDate>Mon, 06 Feb 2012 12:07:52 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>&#8220;What&#8217;s new with Apache Solr&#8221; now available at IBM developerWorks</title>
		<link>http://lucene.grantingersoll.com/2008/11/05/whats-new-with-apache-solr-now-available-at-ibm-developerworks/</link>
		<comments>http://lucene.grantingersoll.com/2008/11/05/whats-new-with-apache-solr-now-available-at-ibm-developerworks/#comments</comments>
		<pubDate>Wed, 05 Nov 2008 16:28:26 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[spell checking]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/2008/11/05/whats-new-with-apache-solr-now-available-at-ibm-developerworks/</guid>
		<description><![CDATA[What&#8217;s new with Apache Solr. My latest article on Apache Solr, title &#8220;What&#8217;s New with Apache Solr&#8221; is now available over at IBM developerWorks.  It covers some of the new features like spell checking, Data Import Handler, distributed search, editorial results placement (a.k.a. &#8220;paid placement&#8221;), SolrJ and a variety of other pieces. Hope it is [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.ibm.com/developerworks/java/library/j-solr-update/?S_TACT=105AGX01&amp;S_CMP=HP">What&#8217;s new with Apache Solr</a>.</p>
<p>My latest article on Apache Solr, title &#8220;What&#8217;s New with Apache Solr&#8221; is now available over at IBM developerWorks.  It covers some of the new features like spell checking, Data Import Handler, distributed search, editorial results placement (a.k.a. &#8220;paid placement&#8221;), SolrJ and a variety of other pieces.</p>
<p>Hope it is helpful&#8230;  Feel free to give me any feedback.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2008/11/05/whats-new-with-apache-solr-now-available-at-ibm-developerworks/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Some New Features in Solr</title>
		<link>http://lucene.grantingersoll.com/2008/10/23/some-new-features-in-solr/</link>
		<comments>http://lucene.grantingersoll.com/2008/10/23/some-new-features-in-solr/#comments</comments>
		<pubDate>Thu, 23 Oct 2008 12:41:08 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Manning]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[spell checking]]></category>
		<category><![CDATA[Taming Text]]></category>
		<category><![CDATA[term vectors]]></category>
		<category><![CDATA[tokenization]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=116</guid>
		<description><![CDATA[I&#8217;ve had a chance recently to work on some things in Solr that I think that can, in the right circumstances, really enhance Solr. First off, is SOLR-651, which implements what I am calling a Term Vector Component. The basic gist of it is that Solr can now serve up term vectors from Lucene.  For [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve had a chance recently to work on some things in Solr that I think that can, in the right circumstances, really enhance Solr.</p>
<p>First off, is <a href="https://issues.apache.org/jira/browse/SOLR-651">SOLR-651</a>, which implements what I am calling a <a href="http://wiki.apache.org/solr/TermVectorComponent">Term Vector Component.</a> The basic gist of it is that Solr can now serve up term vectors from Lucene.  For those not initiated, term vectors store the term, term frequency and, optionally, position and offset information in a document-centric way in Lucene (as opposed to the inverted index storage used for searching.)  Term Vectors are often useful for doing things besides search like highlighting, machine learning, document-document similarity.  This component can provide:</p>
<ol>
<li>Term</li>
<li>Term Frequency</li>
<li>Position (based on analysis)</li>
<li>Offset (character based)</li>
<li>IDF &#8211; Inverse Document Frequency</li>
</ol>
<p>Combining all of these things, plus a couple of other features, I think, can really enable Solr to act as a more general Text server (which is what <a href="http://www.manning.com/ingersoll">Taming Text</a> is going to show.)  For instance, the Analysis Request Handler can act as a Document Analyzer server, and the Luke Request Handler can provide all kinds of corpus statistics.  And I haven&#8217;t even mentioned search, faceting and spell checking yet.  Nor have I mentioned the other thing I am working on:  adding search-result and document clustering to Solr.  This is taking place on <a href="https://issues.apache.org/jira/browse/SOLR-769">SOLR-769</a>.  The basic implementation I have now does search result clustering using the <a href="http://project.carrot2.org/">Carrot2</a> open source project.  After that, I plan on adding in Mahout for document based clustering.  I also know that Tom Morton, for Taming Text, has added in <a href="http://opennlp.sourceforge.net/">OpenNLP</a>&#8216;s Named Entity Recognition into Solr.  Some point in the near future, I&#8217;ll put up a link to that code.</p>
<p>Bottom line: Solr ain&#8217;t just for search anymore!</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2008/10/23/some-new-features-in-solr/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>wpSearch &#8211; Lucene search for WordPress</title>
		<link>http://lucene.grantingersoll.com/2008/08/07/wpsearch-lucene-search-for-wordpress/</link>
		<comments>http://lucene.grantingersoll.com/2008/08/07/wpsearch-lucene-search-for-wordpress/#comments</comments>
		<pubDate>Thu, 07 Aug 2008 12:36:46 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[spell checking]]></category>
		<category><![CDATA[wpSearch]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=93</guid>
		<description><![CDATA[Code Fury The author of this nice plugin for WordPress contacted me today about his Lucene based WordPress plugin, so I thought I would give it a try, as I&#8217;m obviously a big fan of Lucene and also never much cared for MySql&#8217;s search (in)capabilities. The plugin is easy enough to install, only thing that [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://codefury.net/">Code Fury</a></p>
<p>The author of this nice plugin for WordPress contacted me today about his Lucene based WordPress plugin, so I thought I would give it a try, as I&#8217;m obviously a big fan of Lucene and also never much cared for MySql&#8217;s search (in)capabilities.</p>
<p>The plugin is easy enough to install, only thing that struck me as a little odd was the need to set 777 on the permsissions.  Presumably, this is so it can write the index, but perhaps it would be better to store the index outside of the plugin infrastructure.  Of course, I don&#8217;t know what&#8217;s involved with writing plugins, etc. so not sure if that makes sense or not.</p>
<p>Indexing and enabling search was a snap, and I do think the results are good, even if my sites don&#8217;t have a ton of posts.  Indexing on my &#8220;main&#8221; site (<a href="http://www.grantingersoll.com">http://www.grantingersoll.com</a>) took slightly longer than here, but I do have more posts and comments there.</p>
<p>Only minor suggestion I would have is that the default boosts for title and content aren&#8217;t all that great.  I think the title boost was 1.8 and the content boost was 1.3.   I changed mine to be title: 5 and content: 2.  The way boosting works at indexing time, it has only 8 bits of granularity, there isn&#8217;t too much difference between 1.8 and 1.3 and I tend to think title matches are much more important.  Thus, I made them greater.  Still, very cool that the author has hooked field boosting in to begin with.</p>
<p>Things I would love to see:</p>
<ol>
<li>Highlighting</li>
<li>Spell checking</li>
</ol>
<p>All in all, seems to be a great little plugin.  And now, I can &#8220;eat my own dogfood&#8221; too!</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2008/08/07/wpsearch-lucene-search-for-wordpress/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Solr Spell Checking Addition</title>
		<link>http://lucene.grantingersoll.com/2008/06/21/solr-spell-checking-addition/</link>
		<comments>http://lucene.grantingersoll.com/2008/06/21/solr-spell-checking-addition/#comments</comments>
		<pubDate>Sat, 21 Jun 2008 16:16:07 +0000</pubDate>
		<dc:creator>grant_ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[spell checking]]></category>

		<guid isPermaLink="false">http://lucene.grantingersoll.com/?p=84</guid>
		<description><![CDATA[Just committed SOLR-572 yesterday, which adds a spell checking component to Solr.  Now, Solr had a spell checking request handler before, but a component is slightly different.  Request Handlers require separate calls, whereas a component can be inlined in a request.  Essentially, a Request Handler can be made up of one or more SearchComponents. What [...]]]></description>
			<content:encoded><![CDATA[<p>Just committed <a href="https://issues.apache.org/jira/browse/SOLR-572">SOLR-572</a> yesterday, which adds a spell checking component to Solr.  Now, Solr had a spell checking <a href="http://wiki.apache.org/solr/SolrRequestHandler">request handler</a> before, but a component is slightly different.  Request Handlers require separate calls, whereas a component can be inlined in a request.  Essentially, a Request Handler can be made up of one or more <a href="http://wiki.apache.org/solr/SearchComponent">SearchComponents</a>.</p>
<p>What this means, is that one can now get back search results for the given query, and get spelling suggestions at the same time, pretty much like Google&#8217;s &#8220;Did You Mean&#8221; functionality (but probably not the same quality, as they have a much bigger corpus and probably use user feedback as well.)</p>
<p>For details on how to use it, try out the Solr example in the source distribution and see the <a href="http://wiki.apache.org/solr/SpellCheckComponent">Wiki docs</a>.</p>
<p>Also note, that it allows one to plug in their own spell checker (or a commercial one) or use the Lucene spell checker.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucene.grantingersoll.com/2008/06/21/solr-spell-checking-addition/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

