sorting - How to sort text/string in solr using a natural sort order? -
i sort list of values along lines of:
- 4
- 5xa
- 8kdjfew454
- 9
- 10
- 999cc
- b
- c9
- c10cc
- c11
in other words, referred "natural sorting", text sorted alphabetically/lexicographically there text, numerically there numbers, if both mixed in same string.
i can't find anyway in solr (4.0 atm). there standard way or @ least workable "recipe" ?
the closest thing can achieve described in this article
from article:
to force numbers sort numerically, need left-pad numbers zeroes: 2 becomes 0002, 10 becomes 0010, 100 becomes 0100, et cetera. lexical sort arrange values this:
title no. 1 title no. 2 title no. 10 title no. 100
the field type
this alphanumeric sort field type converts numbers found 6 digits, padded zeroes. (if expect numbers larger 6 digits in field values, need increase number of zeroes when padding.)
the field type removes english , french leading articles, lowercases, , purges character isn’t alphanumeric. english-centric, , assumes diacritics have been folded ascii characters.
<fieldtype name="alphanumericsort" class="solr.textfield" sortmissinglast="false" omitnorms="true"> <analyzer> <!-- keywordtokenizer no actual tokenizing, entire input string preserved single token --> <tokenizer class="solr.keywordtokenizerfactory"/> <!-- lowercase tokenfilter expect, can when want sorting case insensitive --> <filter class="solr.lowercasefilterfactory" /> <!-- trimfilter removes leading or trailing whitespace --> <filter class="solr.trimfilterfactory" /> <!-- remove leading articles --> <filter class="solr.patternreplacefilterfactory" pattern="^(a |the |les |la |le |l'|de la |du |des )" replacement="" replace="all" /> <!-- left-pad numbers zeroes --> <filter class="solr.patternreplacefilterfactory" pattern="(\d+)" replacement="00000$1" replace="all" /> <!-- left-trim zeroes produce 6 digit numbers --> <filter class="solr.patternreplacefilterfactory" pattern="0*([0-9]{6,})" replacement="$1" replace="all" /> <!-- remove alphanumeric characters --> <filter class="solr.patternreplacefilterfactory" pattern="([^a-z0-9])" replacement="" replace="all" /> </analyzer> </fieldtype> sample output
title no. 1 => titleno000001 title no. 2 => titleno000002
title no. 10 => titleno000010
title no. 100 => titleno000100
Comments
Post a Comment