Complex Fields (aka “poly” fields) in Apache Solr

I just committed SOLR-1131 which adds a new concept to the Solr FieldType called poly fields.  Previously in Solr, there was pretty much a one to one relationship between a Field and a FieldType.  With PolyFields, it is now possible to model more complex structures that require more than one field to properly represent the data but still providing a single coherent name to call them.

For instance, in the Solr example, I modified the example docs to have a “store” location, as in:

<field name=”id”>6H500F0</field>
<field name=”name”>Maxtor DiamondMax 11 – hard drive – 500 GB – SATA-300</field>

<!– Buffalo store –>
<field name=”store”>45.17614,-93.87341</field>

The store value represents the location where one might be able to buy the hard drive specified.  The value for the field is a latitude and longitude.  I declared the field for this as:

<field name=”store” type=”location” indexed=”true” stored=”true”/>

(Notice, it’s just one field.)  Here’s where it gets interesting.  The FieldType of “location” is a poly field (of PointType) declared as:

<fieldType name=”location” class=”solr.PointType” dimension=”2″ subFieldType=”double”/>

This is a 2D point, meaning that underlying it, if you were to look in the actual Lucene index, there will be three fields, with magic names derived from the original name of the field (aka store).  Why three fields?  There will be two fields indexed but not stored, using dynamic fields of FieldType “double” (the subFieldType) with names like store_0___double and store_1___double and one field called “store” which is stored but not indexed.  If, in the field declaration, stored was false, than there would only be two fields created.

Most importantly, when it comes to searching, clients interact with the “store” field just as they always did, namely:

q=store:45.17614,-93.87341

Solr will take care of recognizing that store is a poly field and will create the query: store_0___double:45.17614 AND store_1___double:-93.87341 underneath the hood.  Even better, ranges still just work too, as in:

q=store:[44,-90 TO 46,-94]

What’s next?  SOLR-1586 will add a poly field type for Cartesian Tiers (and geohash) while SOLR-1568 will add a QParserPlugin that makes querying cartesian tiers (and hence the underlying poly fields) completely seamless.  With the completion of those items, support for full fledged location aware search will be nearly complete for Apache Solr.

15 Responses to “Complex Fields (aka “poly” fields) in Apache Solr”

  1. Hello,
    Will we be able to do proximity search with this feature? I have done something similar where I used hilbert curves to search for euclidian proximity in 3d cartesian system, using the uzaygezen library.
    best.

  2. I’m not familiar with that library, but yes, in general the goal here is to enable apps to represent more complex things like a 3D point and do interesting things with them. If you have code that could be shared, see https://issues.apache.org/jira/browse/SOLR-773 and it’s related issues. Also see http://wiki.apache.org/solr/SpatialSearch

  3. Can these polyFields be made more generic so that they can store data-metadata pairs and not just point coordinates? For example, if iam extracting entities like persons, organizations, locations etc using NLP it would be very beneficial to store name-type like so that people dont have to incur additional query hits to discern the metadata related to the value

    “Grant Ingersoll, person”
    “Iraq, location”
    “Apache Foundation, Organization”

  4. Definitely! The poly field is an attribute of the FieldType. The PointType is just a derived FieldType that is also a poly field.

  5. Thank you Mr.Grant for the prompt reply. From your reply may I presume then pointype fields can be analyzed ? And if yes, is it possible to let the users choose the seperator between value and metadata as it would be easier for them to seperate the value and metadata? For example an organization name can contain commas in them like “Lichtman, Trister & Ross PLLC” which can cause parsing confusions if the metadata values are too large in numbers to compare and strip off.

    And the follow up question would be how do I query to get all ‘person’ in a document, as, a document can contain more than one values for location, organization, person etc

  6. A PointType is just an example of a poly FieldType. For what you are doing, you will likely need to write your own FieldType.

  7. Makes sense, I shall create a new FieldType.

  8. Hi,

    I need to solve 2 problems:

    problem A: searching documents bounded by some area (area is a polygon, rectangle is not enough…) and then reduce results by applying full text search

    problem B: searching documents with areas (aforementioned) overlapping a point and then again I need to apply full text search with specific keywords for each area..

    I know little about Solr/Lucene for a while.. so my question is can I use Solr with this new feature (poly field) to solve these problems?

    thx

  9. Hi Oleg,

    You should be able to solve those problems with Solr, but it will take some work on your part, as there is not polygon support in Solr just yet.

    -Grant

  10. Let’s say I create a car PolyField to support a document that looks like this:

    123
    car picture.jpg

    Honda,Red
    Toyota,Blue

    Now I want to find a picture of a blue Honda, so I run this query:
    q=car:Honda,Blue

    Behind the scenes Solr will execute a query like this:
    car__0:Honda AND car__1:Blue

    I believe that the Solr response will include the example doc – id:123 – but doc id:123 does not have a blue Honda, it contains a red Honda and blue Toyota.

    Thx.

  11. Let’s say I create a car PolyField to support a document that looks like this:
    field name=id value=123
    field name=name value=car picture.jpg
    field name=car value=Honda,Red
    field name=car value=Toyota,Blue

    Now I want to find a picture of a blue Honda, so I run this query:
    q=car:Honda,Blue
    Behind the scenes Solr will execute a query like this:
    car__0:Honda AND car__1:Blue
    I believe that the Solr response will include the example doc – id:123 – but doc id:123 does not have a blue Honda, it contains a red Honda and blue Toyota.

    How do you avoid “false hits” with PolyFields?
    Thx.

  12. Any activity on polygons, perhaps in Solr Trunk? Possible to get some idea on how to do a polygon search? I have a collection of points I need to find documents ‘inside’ of. Thanks for any information.

  13. @Greg thats exactly the problem I have, too!

  14. @Greg (from 5/27), I’m having the same issue where a PolyField with multiValued=true will return false positives similar to your car example.

    A document will have two addresses:
    125 Stark Lane, Apt #24, Beverly Hills, CA 90210-1243
    123 Fake Road, Arlington, VA 22201

    These will parsed into 5 subfields.

    If I do a search for
    125 Stark Lane, Apt #24, Arlington, VA 90210-1243, the document returns when I believe it shouldn’t.

    Any tricks to guarantee the search is ONLY executed across a single PolyField?

  15. Poly fields currently do not support multi valued. You might be able to hack something using SpanQueries, but that would require custom code.

Leave a Reply

*
To prove that you're not a bot, enter this code
Anti-Spam Image