5. Unicode Compliance

Pazpar2 is Unicode compliant and language and locale aware, but relies on character encoding for the targets to be specified correctly if the targets themselves are not UTF-8 based (most aren't). Just a few bad behaving targets can spoil the search experience considerably if for example Greek, Russian or otherwise non 7-bit ASCII search terms are entered. In these cases some targets return records irrelevant to the query, and the result screens will be cluttered with noise.

While noise from misbehaving targets can not be removed, it can be reduced using truly Unicode based ranking. This is an option which is available to the system administrator if ICU support is compiled into YAZ, see Chapter 2, Installation for details.

In addition, the ICU tokenization and normalization rules must be defined in the master configuration file described in the section called “server”.