The ICU chain files defines a chain of rules which specify the conversion process to be carried out for each record string for indexing.
Both searching and sorting is based on the sort normalization that ICU provides. This means that scan and sort will return terms in the sort order given by ICU.
Zebra is using YAZ' ICU wrapper. Refer to the yaz-icu man page for documentation about the ICU chain rules.
Use the yaz-icu program to test your icuchain rules.
Example 10.2. Indexing Greek text
Consider a system where all "regular" text is to be indexed using as Greek (locale: EL). We would have to change our index type file - to read
# Index greek words index w completeness 0 position 1 alwaysmatches 1 firstinfield 1 icuahain greek.xml ..
The ICU chain file
greek.xml could look
<icu_chain locale="el"> <transform rule="[:Control:] Any-Remove"/> <tokenize rule="l"/> <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/> <display/> <casemap rule="l"/> </icu_chain>
Zebra is shipped with a field types file
which is an ICU chain version of
Example 10.3. MARCXML indexing using ICU
a complete sample with MARCXML records that are DOM XML indexed
using ICU chain rules. Study the
README in the
directory for details.