Database Reference
In-Depth Information
Collations
Collations are the mechanism used for comparing strings. By specifying a collation,
you can make comparing strings language-specific (a.k.a. locale-specific). For
instance, in a default comparison of 'në' with 'ni' , 'ni' comes first, because the
Unicode code for an ë is greater than the Unicode code for an i . However, if you
compare these words with a collation for a language that uses diacriticals (like Dutch,
German, or French), things get reversed because the ë is treated like an e .
Supported Collations
eXist supports the following collations:
http://www.w3.org/2005/xpath-functions/collation/codepoint
This is the default collation that uses the Unicode code points. Internally, the
basic Java string comparison and search functions are used.
http://exist-db.org/collation?lang=...&strength=...&decomposition=...
Or for short: ?lang=...&strength=...&decomposition=... (the strength and
decomposition parameters are optional). This specifies a language-specific
collation:
• The lang parameter selects the language using an ISO 639-1 language code
like en , en-US , de , nl-NL , or fr .
You can find out which languages are supported by calling the util:colla
tions extension function.
• The strength parameter value must be one of primary , secondary ,
tertiary , or identical .
• The decomposition parameter value must be one of none , full , or
standard .
What exactly these parameters do is a deep and rather separate subject that we're
not going to handle here. It has to do with the way Unicode is built up, and can‐
onization of Unicode accented characters. Most likely, if you don't know what
this is about, you probably don't need to. A good place to start looking for more
information is the Unicode site .
Specifying Collations
There are several ways to work with collations:
• You can specify a default collation for your XQuery script in its prolog:
Search WWH ::




Custom Search