Chapter 5. Query Model

Table of Contents

1. Query Model Overview
1.1. Query Languages
1.1.1. Prefix Query Format (PQF)
1.1.2. Common Query Language (CQL)
1.2. Operation types
1.2.1. Explain Operation
1.2.2. Search Operation
1.2.3. Scan Operation
2. RPN queries and semantics
2.1. RPN tree structure
2.1.1. Attribute sets
2.1.2. Boolean operators
2.1.3. Atomic queries (APT)
2.1.4. Named Result Sets
2.1.5. Zebra's special access point of type 'string'
2.1.6. Zebra's special access point of type 'XPath' for GRS-1 filters
2.2. Explain Attribute Set
2.2.1. Use Attributes (type = 1)
2.2.2. Explain searches with yaz-client
2.3. BIB-1 Attribute Set
2.3.1. Use Attributes (type 1)
2.4. Zebra general Bib1 Non-Use Attributes (type 2-6)
2.4.1. Relation Attributes (type 2)
2.4.2. Position Attributes (type 3)
2.4.3. Structure Attributes (type 4)
2.4.4. Truncation Attributes (type = 5)
2.4.5. Completeness Attributes (type = 6)
3. Extended Zebra RPN Features
3.1. Zebra specific retrieval of all records
3.2. Zebra specific Search Extensions to all Attribute Sets
3.2.1. Zebra Extension Embedded Sort Attribute (type 7)
3.2.2. Zebra Extension Rank Weight Attribute (type 9)
3.2.3. Zebra Extension Term Reference Attribute (type 10)
3.2.4. Local Approximative Limit Attribute (type 11)
3.2.5. Global Approximative Limit Attribute (type 12)
3.3. Zebra specific Scan Extensions to all Attribute Sets
3.3.1. Zebra Extension Result Set Narrow (type 8)
3.3.2. Zebra Extension Approximative Limit (type 12)
3.4. Zebra special IDXPATH Attribute Set for GRS-1 indexing
3.4.1. IDXPATH Use Attributes (type = 1)
3.5. Mapping from PQF atomic APT queries to Zebra internal register indexes
3.5.1. Mapping of PQF APT access points
3.5.2. Mapping of PQF APT structure and completeness to register type
3.6. Zebra Regular Expressions in Truncation Attribute (type = 5)
4. Server Side CQL to PQF Query Translation

1. Query Model Overview

1.1. Query Languages

Zebra is born as a networking Information Retrieval engine adhering to the international standards Z39.50 and SRU, and implement the type-1 Reverse Polish Notation (RPN) query model defined there. Unfortunately, this model has only defined a binary encoded representation, which is used as transport packaging in the Z39.50 protocol layer. This representation is not human readable, nor defines any convenient way to specify queries.

Since the type-1 (RPN) query structure has no direct, useful string representation, every client application needs to provide some form of mapping from a local query notation or representation to it.

1.1.1. Prefix Query Format (PQF)

Index Data has defined a textual representation in the Prefix Query Format, short PQF, which maps one-to-one to binary encoded type-1 RPN queries. PQF has been adopted by other parties developing Z39.50 software, and is often referred to as Prefix Query Notation, or in short PQN. See Section 2, “RPN queries and semantics” for further explanations and descriptions of Zebra's capabilities.

1.1.2. Common Query Language (CQL)

The query model of the type-1 RPN, expressed in PQF/PQN is natively supported. On the other hand, the default SRU web services Common Query Language CQL is not natively supported.

Zebra can be configured to understand and map CQL to PQF. See Section 4, “Server Side CQL to PQF Query Translation”.

1.2. Operation types

Zebra supports all of the three different Z39.50/SRU operations defined in the standards: explain, search, and scan. A short description of the functionality and purpose of each is quite in order here.

1.2.1. Explain Operation

The syntax of Z39.50/SRU queries is well known to any client, but the specific semantics - taking into account a particular servers functionalities and abilities - must be discovered from case to case. Enters the explain operation, which provides the means for learning which fields (also called indexes or access points) are provided, which default parameter the server uses, which retrieve document formats are defined, and which specific parts of the general query model are supported.

The Z39.50 embeds the explain operation by performing a search in the magic IR-Explain-1 database; see Section 2.2, “Explain Attribute Set”.

In SRU, explain is an entirely separate operation, which returns an ZeeRex XML record according to the structure defined by the protocol.

In both cases, the information gathered through explain operations can be used to auto-configure a client user interface to the servers capabilities.

1.2.2. Search Operation

Search and retrieve interactions are the raison d'ętre. They are used to query the remote database and return search result documents. Search queries span from simple free text searches to nested complex boolean queries, targeting specific indexes, and possibly enhanced with many query semantic specifications. Search interactions are the heart and soul of Z39.50/SRU servers.

1.2.3. Scan Operation

The scan operation is a helper functionality, which operates on one index or access point a time.

It provides the means to investigate the content of specific indexes. Scanning an index returns a handful of terms actually found in the indexes, and in addition the scan operation returns the number of documents indexed by each term. A search client can use this information to propose proper spelling of search terms, to auto-fill search boxes, or to display controlled vocabularies.