sorting - How to sort text/string in solr using a natural sort order? -


i sort list of values along lines of:

  • 4
  • 5xa
  • 8kdjfew454
  • 9
  • 10
  • 999cc
  • b
  • c9
  • c10cc
  • c11

in other words, referred "natural sorting", text sorted alphabetically/lexicographically there text, numerically there numbers, if both mixed in same string.

i can't find anyway in solr (4.0 atm). there standard way or @ least workable "recipe" ?

the closest thing can achieve described in this article

from article:

to force numbers sort numerically, need left-pad numbers zeroes: 2 becomes 0002, 10 becomes 0010, 100 becomes 0100, et cetera. lexical sort arrange values this:

title no. 1 title no. 2 title no. 10 title no. 100

the field type

this alphanumeric sort field type converts numbers found 6 digits, padded zeroes. (if expect numbers larger 6 digits in field values, need increase number of zeroes when padding.)

the field type removes english , french leading articles, lowercases, , purges character isn’t alphanumeric. english-centric, , assumes diacritics have been folded ascii characters.

<fieldtype name="alphanumericsort" class="solr.textfield" sortmissinglast="false" omitnorms="true">    <analyzer>      <!-- keywordtokenizer no actual tokenizing, entire           input string preserved single token        -->      <tokenizer class="solr.keywordtokenizerfactory"/>      <!-- lowercase tokenfilter expect, can           when want sorting case insensitive        -->      <filter class="solr.lowercasefilterfactory" />      <!-- trimfilter removes leading or trailing whitespace -->      <filter class="solr.trimfilterfactory" />      <!-- remove leading articles -->      <filter class="solr.patternreplacefilterfactory"              pattern="^(a |the |les |la |le |l'|de la |du |des )" replacement="" replace="all"      />      <!-- left-pad numbers zeroes -->      <filter class="solr.patternreplacefilterfactory"              pattern="(\d+)" replacement="00000$1" replace="all"      />      <!-- left-trim zeroes produce 6 digit numbers -->      <filter class="solr.patternreplacefilterfactory"              pattern="0*([0-9]{6,})" replacement="$1" replace="all"      />      <!-- remove alphanumeric characters -->      <filter class="solr.patternreplacefilterfactory"              pattern="([^a-z0-9])" replacement="" replace="all"      />    </analyzer>  </fieldtype>

sample output

title no. 1 => titleno000001 title no. 2 => titleno000002
title no. 10 => titleno000010
title no. 100 => titleno000100


Comments

Popular posts from this blog

how to insert data php javascript mysql with multiple array session 2 -

multithreading - Exception in Application constructor -

windows - CertCreateCertificateContext returns CRYPT_E_ASN1_BADTAG / 8009310b -