FAST to Solr for *NIX

Hardly news anymore about MS dropping support for *NIX platforms, but I especially like J Lawson’s nice little quote about their switch from FAST to Apache Solr/Lucene:

We very quickly switched to Apache Lucene, Solr and are very happy with the result. Performance is good, and we have cut hosting costs by a staggering 400%.

via J Lawson’s Blog Space » 2010 » February.

I’ve seen likewise reductions (and sometimes more significant ones)  in other replacements I’ve been involved with at Lucid Imagination.  To me, the likely reason is the flexibility that Lucene/Solr provide for an application to model it’s domain as efficiently as needed.  Not too much, not too little instead of the “kitchen sink” approach which turns on every feature by default.  Too often, I think buyers are seduced by a really long feature list, even though their application may only need 80% of those features.  After all, would you pay extra for a car with heated seats if you lived in the tropics?  With Solr, you get rock solid search capabilities (no one disputes that, because all the vendors pretty much use the same model) and can easily turn on/off most of the other features.  Furthermore, if you need something not included (most of the time you don’t, b/c it’s already there), you can choose the best in breed implementation of that feature and integrate it.

At any rate, as Shalin said the other day, Apache L/S welcomes all FAST *NIX users.

Lucid Imagination » The Seven Deadly Sins of Solr

Props to Jay Hill on an excellent article on things to watch for when setting up Solr: Lucid Imagination » The Seven Deadly Sins of Solr.

Just posted on: Apache Lucene Connector Framework now in Incubation at the ASF

I just put up some initial info on the new Apache Lucene Connector Framework project that is now in ASF Incubation.  See Lucid Imagination » Apache Lucene Connector Framework now in Incubation at the ASF.

Measuring Measures: Learning About Statistical Learning

All you Mahout’s out there might find some background help in Bradford Cross’ blog post: Measuring Measures: Learning About Statistical Learning.

Spatial Search Article is Live

My latest article is up at IBM’s developerWorks on spatial search with Lucene and Solr.  Have a look at: Location-aware search with Apache Lucene and Solr.

SFBay Apache Lucene/Solr Meetup Jan 21st.

Details and RSVP at: SFBay Apache Lucene/Solr Meetup San Mateo, CA – Meetup.com.

Lucid Imagination » Announcement: New LucidWorks Certified Distribution for Solr

Lucid Imagination » Announcement: New LucidWorks Certified Distribution for Solr.

Our new Solr distro is out, along w/ the reference guide.  Ref. guide can also be searched online via http://search.lucidimagination.com.

Manning: Mahout in Action

Very cool, Manning already has up the first 6 chapters of Mahout in Action.

Complex Fields (aka “poly” fields) in Apache Solr

I just committed SOLR-1131 which adds a new concept to the Solr FieldType called poly fields.  Previously in Solr, there was pretty much a one to one relationship between a Field and a FieldType.  With PolyFields, it is now possible to model more complex structures that require more than one field to properly represent the data but still providing a single coherent name to call them.

For instance, in the Solr example, I modified the example docs to have a “store” location, as in:

<field name=”id”>6H500F0</field>
<field name=”name”>Maxtor DiamondMax 11 – hard drive – 500 GB – SATA-300</field>

<!– Buffalo store –>
<field name=”store”>45.17614,-93.87341</field>

The store value represents the location where one might be able to buy the hard drive specified.  The value for the field is a latitude and longitude.  I declared the field for this as:

<field name=”store” type=”location” indexed=”true” stored=”true”/>

(Notice, it’s just one field.)  Here’s where it gets interesting.  The FieldType of “location” is a poly field (of PointType) declared as:

<fieldType name=”location” class=”solr.PointType” dimension=”2″ subFieldType=”double”/>

This is a 2D point, meaning that underlying it, if you were to look in the actual Lucene index, there will be three fields, with magic names derived from the original name of the field (aka store).  Why three fields?  There will be two fields indexed but not stored, using dynamic fields of FieldType “double” (the subFieldType) with names like store_0___double and store_1___double and one field called “store” which is stored but not indexed.  If, in the field declaration, stored was false, than there would only be two fields created.

Most importantly, when it comes to searching, clients interact with the “store” field just as they always did, namely:

q=store:45.17614,-93.87341

Solr will take care of recognizing that store is a poly field and will create the query: store_0___double:45.17614 AND store_1___double:-93.87341 underneath the hood.  Even better, ranges still just work too, as in:

q=store:[44,-90 TO 46,-94]

What’s next?  SOLR-1586 will add a poly field type for Cartesian Tiers (and geohash) while SOLR-1568 will add a QParserPlugin that makes querying cartesian tiers (and hence the underlying poly fields) completely seamless.  With the completion of those items, support for full fledged location aware search will be nearly complete for Apache Solr.

Shalin Says…

I’m liking the looks of Apache Solr committer Shalin’s new website (right down to the cool domain name that lines up w/ his name!): check out Shalin Says….

Lot’s of good stuff on the Lucene ecosystem on his page.  I especially like his post on Why You Should Contribute to Open Source.