<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Part 2 of IBM developerWorks article on Solr</title>
	<atom:link href="http://lucene.grantingersoll.com/2007/06/06/part-2-of-ibm-developerworks-article-on-solr/feed/" rel="self" type="application/rss+xml" />
	<link>http://lucene.grantingersoll.com/2007/06/06/part-2-of-ibm-developerworks-article-on-solr/</link>
	<description>Thoughts on Apache Lucene, Mahout, Solr, Tika and Nutch</description>
	<pubDate>Fri, 08 Aug 2008 00:52:38 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
		<item>
		<title>By: grant_ingersoll</title>
		<link>http://lucene.grantingersoll.com/2007/06/06/part-2-of-ibm-developerworks-article-on-solr/#comment-4637</link>
		<dc:creator>grant_ingersoll</dc:creator>
		<pubDate>Fri, 05 Oct 2007 13:13:15 +0000</pubDate>
		<guid isPermaLink="false">http://lucene.grantingersoll.com/2007/06/06/part-2-of-ibm-developerworks-article-on-solr/#comment-4637</guid>
		<description>Thanks.

I would check out NekoHTML or (J)Tidy for HTML.  Solr also comes with some HTML helpers.

As for XML, you can use either SAX or a pull parser.  Java 1.5 comes with SAX, but you can also use Xerces.  For a pull parser, have a look at XPP.

If you want all kinds of document extraction, take a look at Aperture (http://aperture.sourceforge.net)</description>
		<content:encoded><![CDATA[<p>Thanks.</p>
<p>I would check out NekoHTML or (J)Tidy for HTML.  Solr also comes with some HTML helpers.</p>
<p>As for XML, you can use either SAX or a pull parser.  Java 1.5 comes with SAX, but you can also use Xerces.  For a pull parser, have a look at XPP.</p>
<p>If you want all kinds of document extraction, take a look at Aperture (http://aperture.sourceforge.net)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Leslie</title>
		<link>http://lucene.grantingersoll.com/2007/06/06/part-2-of-ibm-developerworks-article-on-solr/#comment-4635</link>
		<dc:creator>Leslie</dc:creator>
		<pubDate>Thu, 04 Oct 2007 10:24:55 +0000</pubDate>
		<guid isPermaLink="false">http://lucene.grantingersoll.com/2007/06/06/part-2-of-ibm-developerworks-article-on-solr/#comment-4635</guid>
		<description>It's quite good, I was looking for some articles for Solr when I found it. 
it involkes Solr and describes it as so easy to use. it's pretty useful for me just now, and it must also feed others well who have the interests of Solr. 
In the case of Solr handling structured documents, It would be even better that you can recommend some libraries to deal with XML and HTML that're used to convert some kinds of resource to the format transported to Solr interfaces. If you have any recommandation, please tell me if possible. 

Thanks.</description>
		<content:encoded><![CDATA[<p>It&#8217;s quite good, I was looking for some articles for Solr when I found it.<br />
it involkes Solr and describes it as so easy to use. it&#8217;s pretty useful for me just now, and it must also feed others well who have the interests of Solr.<br />
In the case of Solr handling structured documents, It would be even better that you can recommend some libraries to deal with XML and HTML that&#8217;re used to convert some kinds of resource to the format transported to Solr interfaces. If you have any recommandation, please tell me if possible. </p>
<p>Thanks.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 1.071 seconds -->
