zebrasrv — Zebra Server
zebrasrv
[-install
] [-installa
] [-remove
] [-a
] [file
-v
] [level
-l
] [file
-u
] [uid
-c
] [config
-f
] [vconfig
-C
] [fname
-t
] [minutes
-k
] [kilobytes
-d
] [daemon
-w
] [dir
-p
] [pidfile
-ziDST1
] [listener-spec...]
Zebra is a high-performance, general-purpose structured text indexing and retrieval engine. It reads structured records in a variety of input formats (e.g. email, XML, MARC) and allows access to them through exact boolean search expressions and relevance-ranked free-text queries.
zebrasrv is the Z39.50 and SRU frontend server for the Zebra search engine and indexer.
On Unix you can run the zebrasrv server from the command line - and put it in the background. It may also operate under the inet daemon. On WIN32 you can run the server as a console application or as a WIN32 Service.
The options for zebrasrv are the same
as those for YAZ' yaz-ztest.
Option -c
specifies a Zebra configuration
file - if omitted zebra.cfg
is read.
-a
file
Specify a file for dumping PDUs (for diagnostic purposes).
The special name -
(dash) sends output to
stderr
.
-S
Don't fork or make threads on connection requests. This is good for debugging, but not recommended for real operation: Although the server is asynchronous and non-blocking, it can be nice to keep a software malfunction (okay then, a crash) from affecting all current users. The server can only accept a single connection in this mode.
-1
Like -S
but after one session the server
exits. This mode is for debugging only.
-T
Operate the server in threaded mode. The server creates a thread for each connection rather than a fork a process. Only available on UNIX systems that offers POSIX threads.
-s
Use the SR protocol (obsolete).
-z
Use the Z39.50 protocol (default). This option and -s
complement each other.
You can use both multiple times on the same command
line, between listener-specifications (see below). This way, you
can set up the server to listen for connections in both protocols
concurrently, on different local ports.
-l
file
Specify an output file for the diagnostic messages.
The default is to write this information to
stderr
-c
config-file
Read configuration information from
config-file
.
The default configuration is ./zebra.cfg
-f
vconfig
This specifies an XML file that describes one or more YAZ frontend virtual servers. See section VIRTUAL HOSTS for details.
-C
fname
Sets SSL certificate file name for server (PEM).
-v
level
The log level. Use a comma-separated list of members of the set {fatal,debug,warn,log,malloc,all,none}.
-u
uid
Set user ID. Sets the real UID of the server process to that of the given user. It's useful if you aren't comfortable with having the server run as root, but you need to start it as such to bind a privileged port.
-w
working-directory
The server changes to this working directory during before listening
on incoming connections. This option is useful
when the server is operating from the inetd
daemon (see -i
).
-p
pidfile
Specifies that the server should write its Process ID to
file given by pidfile
.
A typical location would be /var/run/zebrasrv.pid
.
-i
Use this to make the the server run from the
inetd server (UNIX only).
Make sure you use the logfile option -l
in
conjunction with this mode and specify the -l
option before any other options.
-D
Use this to make the server put itself in the background and
run as a daemon. If neither -i
nor
-D
is given, the server starts in the foreground.
-install
Use this to install the server as an NT service (Windows NT/2000/XP only). Control the server by going to the Services in the Control Panel.
-installa
Use this to install and activate the server as an NT service (Windows NT/2000/XP only). Control the server by going to the Services in the Control Panel.
-remove
Use this to remove the server from the NT services (Windows NT/2000/XP only).
-t
minutes
Idle session timeout, in minutes. Default is 60 minutes.
-k
size
Maximum record size/message size, in kilobytes. Default is 1024 KB (1 MB).
-d
daemon
Set name of daemon to be used in hosts access file. See hosts_access(5) and tcpd(8).
A listener-address
consists of an optional
transport mode followed by a colon (:) followed by a listener address.
The transport mode is either a file system socket
unix
,
a SSL TCP/IP socket ssl
, or a plain TCP/IP socket
tcp
(default).
For TCP, an address has the form
hostname | IP-number [: portnumber]
The port number defaults to 210 (standard Z39.50 port) for privileged users (root), and 9999 for normal users. The special hostname "@" is mapped to the address INADDR_ANY, which causes the server to listen on any local interface.
The default behavior for zebrasrv
- if started
as non-privileged user - is to establish
a single TCP/IP listener, for the Z39.50 protocol, on port 9999.
zebrasrv @ zebrasrv tcp:some.server.name.org:1234 zebrasrv ssl:@:3000
To start the server listening on the registered port for Z39.50, or on a filesystem socket, and to drop root privileges once the ports are bound, execute the server like this from a root shell:
zebrasrv -u daemon @ zebrasrv -u daemon tcp:@:210 zebrasrv -u daemon unix:/some/file/system/socket
Here daemon
is an existing user account, and the
unix socket /some/file/system/socket
is readable
and writable for the daemon
account.
During initialization, the server will negotiate to version 3 of the Z39.50 protocol, and the option bits for Search, Present, Scan, NamedResultSets, and concurrentOperations will be set, if requested by the client. The maximum PDU size is negotiated down to a maximum of 1 MB by default.
The supported query type are 1 and 101. All operators are currently supported with the restriction that only proximity units of type "word" are supported for the proximity operator. Queries can be arbitrarily complex. Named result sets are supported, and result sets can be used as operands without limitations. Searches may span multiple databases.
The server has full support for piggy-backed retrieval (see also the following section).
The present facility is supported in a standard fashion. The requested record syntax is matched against the ones supported by the profile of each record retrieved. If no record syntax is given, SUTRS is the default. The requested element set name, again, is matched against any provided by the relevant record profiles.
The attribute combinations provided with the termListAndStartPoint are processed in the same way as operands in a query (see above). Currently, only the term and the globalOccurrences are returned with the termInfo structure.
Z39.50 specifies three different types of sort criteria. Of these Zebra supports the attribute specification type in which case the use attribute specifies the "Sort register". Sort registers are created for those fields that are of type "sort" in the default.idx file. The corresponding character mapping file in default.idx specifies the ordinal of each character used in the actual sort.
Z39.50 allows the client to specify sorting on one or more input result sets and one output result set. Zebra supports sorting on one result set only which may or may not be the same as the output result set.
If a Close PDU is received, the server will respond with a Close PDU with reason=FINISHED, no matter which protocol version was negotiated during initialization. If the protocol version is 3 or more, the server will generate a Close PDU under certain circumstances, including a session timeout (60 minutes by default), and certain kinds of protocol errors. Once a Close PDU has been sent, the protocol association is considered broken, and the transport connection will be closed immediately upon receipt of further data, or following a short timeout.
Zebra maintains a "classic"
Z39.50 Explain database
on the side.
This database is called IR-Explain-1
and can be
searched using the attribute set exp-1
.
The records in the explain database are of type
grs.sgml
.
The root element for the Explain grs.sgml records is
explain
, thus
explain.abs
is used for indexing.
Zebra must be able to locate
explain.abs
in order to index the Explain
records properly. Zebra will work without it but the information
will not be searchable.
In addition to Z39.50, Zebra supports the more recent and
web-friendly IR protocol SRU.
SRU can be carried over SOAP or a REST-like protocol
that uses HTTP GET or POST to request search responses. The request
itself is made of parameters such as
query
,
startRecord
,
maximumRecords
and
recordSchema
;
the response is an XML document containing hit-count, result-set
records, diagnostics, etc. SRU can be thought of as a re-casting
of Z39.50 semantics in web-friendly terms; or as a standardisation
of the ad-hoc query parameters used by search engines such as Google
and AltaVista; or as a superset of A9's OpenSearch (which it
predates).
Zebra supports Z39.50, SRU GET, SRU POST, SRU SOAP (SRW) - on the same port, recognising what protocol is used by each incoming requests and handling them accordingly. This is a achieved through the use of Deep Magic; civilians are warned not to stand too close.
Because Zebra supports all protocols on one port, it would seem to follow that the SRU server is run in the same way as the Z39.50 server, as described above. This is true, but only in an uninterestingly vacuous way: a Zebra server run in this manner will indeed recognise and accept SRU requests; but since it doesn't know how to handle the CQL queries that these protocols use, all it can do is send failure responses.
It is possible to cheat, by having SRU search Zebra with
a PQF query instead of CQL, using the
x-pquery
parameter instead of
query
.
This is a
non-standard extension
of CQL, and a
very naughty
thing to do, but it does give you a way to see Zebra serving SRU
``right out of the box''. If you start your favourite Zebra
server in the usual way, on port 9999, then you can send your web
browser to:
http://localhost:9999/Default?version=1.1 &operation=searchRetrieve &x-pquery=mineral &startRecord=1 &maximumRecords=1
This will display the XML-formatted SRU response that includes the
first record in the result-set found by the query
mineral
. (For clarity, the SRU URL is shown
here broken across lines, but the lines should be joined together
to make single-line URL for the browser to submit.)
In order to turn on Zebra's support for CQL queries, it's necessary
to have the YAZ generic front-end (which Zebra uses) translate them
into the Z39.50 Type-1 query format that is used internally. And
to do this, the generic front-end's own configuration file must be
used. See the section called “YAZ server virtual hosts”;
the salient point for SRU support is that
zebrasrv
must be started with the
-f frontendConfigFile
option rather than the
-c zebraConfigFile
option,
and that the front-end configuration file must include both a
reference to the Zebra configuration file and the CQL-to-PQF
translator configuration file.
A minimal front-end configuration file that does this would read as follows:
<yazgfs> <server> <config>zebra.cfg</config> <cql2rpn>../../tab/pqf.properties</cql2rpn> </server> </yazgfs>
The
<config>
element contains the name of the Zebra configuration file that was
previously specified by the
-c
command-line argument, and the
<cql2rpn>
element contains the name of the CQL properties file specifying how
various CQL indexes, relations, etc. are translated into Type-1
queries.
A zebra server running with such a configuration can then be queried using proper, conformant SRU URLs with CQL queries:
http://localhost:9999/Default?version=1.1 &operation=searchRetrieve &query=title=utah and description=epicent* &startRecord=1 &maximumRecords=1
Zebra running as an SRU server supports SRU version 1.1, including CQL version 1.1. In particular, it provides support for the following elements of the protocol.
Zebra supports the searchRetrieve operation.
One of the great strengths of SRU is that it mandates a standard
query language, CQL, and that all conforming implementations can
therefore be trusted to correctly interpret the same queries. It
is with some shame, then, that we admit that Zebra also supports
an additional query language, our own Prefix Query Format
(PQF).
A PQF query is submitted by using the extension parameter
x-pquery
,
in which case the
query
parameter must be omitted, which makes the request not valid SRU.
Please feel free to use this facility within your own
applications; but be aware that it is not only non-standard SRU
but not even syntactically valid, since it omits the mandatory
query
parameter.
Zebra supports scan operation.
Scanning using CQL syntax is the default, where the
standard scanClause
parameter is used.
In addition, a
mutant form of SRU scan is supported, using
the non-standard x-pScanClause
parameter in
place of the standard scanClause
to scan on a
PQF query clause.
Zebra supports explain.
The ZeeRex record explaining a database may be requested either
with a fully fledged SRU request (with
operation
=explain
and version-number specified)
or with a simple HTTP GET at the server's basename.
The ZeeRex record returned in response is the one embedded
in the YAZ Frontend Server configuration file that is described in the
the section called “YAZ server virtual hosts”.
Unfortunately, the data found in the CQL-to-PQF text file must be added by hand-craft into the explain section of the YAZ Frontend Server configuration file to be able to provide a suitable explain record. Too bad, but this is all extreme new alpha stuff, and a lot of work has yet to be done ..
There is no linkage whatsoever between the Z39.50 explain model and the SRU explain response (well, at least not implemented in Zebra, that is ..). Zebra does not provide a means using Z39.50 to obtain the ZeeRex record.
In the Z39.50 protocol, Initialization, Present, Sort and Close are separate operations. In SRU, however, these operations do not exist.
SRU has no explicit initialization handshake phase, but commences immediately with searching, scanning and explain operations.
Neither does SRU have a close operation, since the protocol is stateless and each request is self-contained. (It is true that multiple SRU request/response pairs may be implemented as multiple HTTP request/response pairs over a single persistent TCP/IP connection; but the closure of that connection is not a protocol-level operation.)
Retrieval in SRU is part of the
searchRetrieve
operation, in which a search
is submitted and the response includes a subset of the records
in the result set. There is no direct analogue of Z39.50's
Present operation which requests records from an established
result set. In SRU, this is achieved by sending a subsequent
searchRetrieve
request with the query
cql.resultSetId=
id where
id is the identifier of the previously
generated result-set.
Sorting in CQL is done within the
searchRetrieve
operation - in v1.1, by an
explicit sort
parameter, but the forthcoming
v1.2 or v2.0 will most likely use an extension of the query
language, CQL sorting.
It can be seen, then, that while Zebra operating as an SRU server does not provide the same set of operations as when operating as a Z39.50 server, it does provide equivalent functionality.
Surf into http://localhost:9999
to get an explain response, or use
http://localhost:9999/?version=1.1&operation=explain
See number of hits for a query
http://localhost:9999/?version=1.1&operation=searchRetrieve &query=text=(plant%20and%20soil)
Fetch record 5-7 in Dublin Core format
http://localhost:9999/?version=1.1&operation=searchRetrieve &query=text=(plant%20and%20soil) &startRecord=5&maximumRecords=2&recordSchema=dc
Even search using PQF queries using the extended naughty
parameter x-pquery
http://localhost:9999/?version=1.1&operation=searchRetrieve &x-pquery=@attr%201=text%20@and%20plant%20soil
Or scan indexes using the extended extremely naughty
parameter x-pScanClause
http://localhost:9999/?version=1.1&operation=scan &x-pScanClause=@attr%201=text%20something
Don't do this in production code! But it's a great fast debugging aid.
The Virtual hosts mechanism allows a YAZ frontend server to support multiple backends. A backend is selected on the basis of the TCP/IP binding (port+listening address) and/or the virtual host.
A backend can be configured to execute in a particular working directory. Or the YAZ frontend may perform CQL to RPN conversion, thus allowing traditional Z39.50 backends to be offered as a SRU service. SRU Explain information for a particular backend may also be specified.
For the HTTP protocol, the virtual host is specified in the Host header. For the Z39.50 protocol, the virtual host is specified as in the Initialize Request in the OtherInfo, OID 1.2.840.10003.10.1000.81.1.
Not all Z39.50 clients allows the VHOST information to be set. For those the selection of the backend must rely on the TCP/IP information alone (port and address).
The YAZ frontend server uses XML to describe the backend
configurations. Command-line option -f
specifies filename of the XML configuration.
The configuration uses the root element yazgfs
.
This element includes a list of listen
elements,
followed by one or more server
elements.
The listen
describes listener (transport end point),
such as TCP/IP, Unix file socket or SSL server. Content for
a listener:
The CDATA for the listen
element holds the
listener string, such as tcp:@:210
,
tcp:server1:2100
,
etc.
id
(optional)identifier for this listener. This may be referred to from server sections.
We expect more information to be added for the listen section in a future version, such as CERT file for SSL servers.
The server
describes a server and the parameters
for this server type. Content for a server:
id
(optional)Identifier for this server. Currently not used for anything, but it might be for logging purposes.
listenref
(optional)Specifies listener for this server. If this attribute is not given, the server is accessible from all listener. In order for the server to be used for real, however, the virtual host must match (if specified in the configuration).
config
(optional)
Specifies the server configuration. This is equivalent
to the config specified using command line option
-c
.
directory
(optional)Specifies a working directory for this backend server. If specified, the YAZ frontend changes current working directory to this directory whenever a backend of this type is started (backend handler bend_start), stopped (backend handler hand_stop) and initialized (bend_init).
host
(optional)Specifies the virtual host for this server. If this is specified a client must specify this host string in order to use this backend.
cql2rpn
(optional)Specifies a filename that includes CQL to RPN conversion for this backend server. See CQL section in YAZ manual. If given, the backend server will only "see" a Type-1/RPN query.
explain
(optional)Specifies SRU ZeeRex content for this server - copied verbatim to the client. As things are now, some of the Explain content seems redundant because host information, etc. is also stored elsewhere.
The format of the Explain record is described in detail, with examples, on the file at the ZeeRex web-site.
The XML below configures a server that accepts connections from
two ports, TCP/IP port 9900 and a local UNIX file socket.
We name the TCP/IP server public
and the
other server internal
.
<yazgfs> <listen id="public">tcp:@:9900</listen> <listen id="internal">unix:/var/tmp/socket</listen> <server id="server1"> <host>server1.mydomain</host> <directory>/var/www/s1</directory> <config>config.cfg</config> </server> <server id="server2"> <host>server2.mydomain</host> <directory>/var/www/s2</directory> <config>config.cfg</config> <cql2rpn>../etc/pqf.properties</cql2rpn> <explain xmlns="http://explain.z3950.org/dtd/2.0/"> <serverInfo> <host>server2.mydomain</host> <port>9900</port> <database>a</database> </serverInfo> </explain> </server> <server id="server3" listenref="internal"> <directory>/var/www/s3</directory> <config>config.cfg</config> </server> </yazgfs>
There are three configured backend servers. The first two
servers, "server1"
and "server2"
,
can be reached by both listener addresses - since
no listenref
attribute is specified.
In order to distinguish between the two a virtual host has
been specified for each of server in the host
elements.
For "server2"
elements for
CQL to RPN conversion
is supported and explain information has been added (a short one here
to keep the example small).
The third server, "server3"
can only be reached
via listener "internal"
.