The Proxy may read a configuration file using option
-c
followed by the filename of a config file.
The config file is XML based. The YAZ proxy must be compiled with libxml2 and libXSLT support in order for the config file facility to be enabled.
See Section 12, “YAZ Proxy Configuration Schema” for an XML schema for the configuration.
To check for a config file to be well-formed, the yazproxy may be invoked without specifying a listening port, i.e.
yazproxy -c myconfig.xml
If this does not produce errors, the file is well-formed.
The proxy config file must have a root element called
proxy
and scoped within namespace
xmlns="http://indexdata.dk/yazproxy/schema/0.9/"
.
All information except an optional XML header must be stored
within the proxy
element.
<?xml version="1.0"?> <proxy xmlns="http://indexdata.dk/yazproxy/schema/0.9/"> <!-- content here .. --> </proxy>
The element target
which may be repeated zero
or more times with parent element proxy
contains
information about each backend target.
The target
element have two attributes:
name
which holds the logical name of the backend
target (required) and default
(optional) which
(when given) specifies that the backend target is the default target -
equivalent to command line option -t
.
<?xml version="1.0"?> <proxy xmlns="http://indexdata.dk/yazproxy/schema/0.9/"> <target name="server1" default="1"> <!-- description of server1 .. --> </target> <target name="server2"> <!-- description of server2 .. --> </target> </proxy>
The url
which may be repeated one or more times
should be the child of the target
element.
The CDATA of url
is the Z-URL of the backend.
Multiple url
element may be used. In that case, then
a client initiates a session, the proxy chooses the URL with the lowest
number of active sessions, thereby distributing the load. It is
assumed that each URL represents the same database (data).
The element target-timeout
is the child of element
target
and specifies the amount in seconds before
a target session is shut down.
This can also be specified on the command line by using option
-T
. Refer to OPTIONS in Section 10, “Proxy Manual Pages”.
The element client-timeout
is the child of element
target
and specifies the amount in seconds before
a client session is shut down.
This can also be specified on the command line by using option
-i
. Refer to OPTIONS in Section 10, “Proxy Manual Pages”.
The element max-sockets
is the child of element
target
and specifies the maximum number of sockets
to use for the target for all sessions using it. In other words: maximum
number of Z39.50 session to the target.
The keepalive
element holds information about
the keepalive Z39.50 sessions. Keepalive sessions are proxy-to-backend
sessions that is no longer associated with a client session.
The keepalive
element which is the child of
the target
holds two elements:
bandwidth
and pdu
.
The bandwidth
is the maximum total bytes
transferred to/from the target. If a target session exceeds this
limit, it is shut down (and no longer kept alive).
The pdu
is the maximum number of requests sent
to the target. If a target session exceeds this limit, it is
shut down. The idea of these two limits is that avoid very long
sessions that use resources in a backend (that leaks!).
The following sets maximum number of bytes transferred in a target session to 1 MB and maximum of requests to 400.
<keepalive> <bandwidth>1048576</bandwidth> <pdu>400</pdu> </keepalive>
The limit
section specifies bandwidth/pdu requests
limits for an active session.
The proxy records bandwidth/pdu requests during the last 60 seconds
(1 minute). The limit
may include the
elements bandwidth
, pdu
,
retrieve
and search
.
The bandwidth
measures the number of bytes transferred within the last minute.
The pdu
is the number of requests in the last
minute. The retrieve
holds the maximum records to
which may be retrieved in one Present Request.
The search
is the maximum number of searches
within the last minute.
If a bandwidth/pdu/search limit is reached the proxy will postpone the requests to the target and wait one or more seconds. The idea of the limit is to ensure that clients that downloads hundreds or thousands of records do not hurt other users.
The following sets maximum number of bytes transferred per minute to 500Kbytes, maximum number of records retrievals to 40 and maximum number of searches to 20.
<limit> <bandwidth>524288</bandwidth> <retrieve>40</retrieve> <search>20</search> </limit>
Typically the values in the keepalive section are mugh higher than their equivalent limit counterparts (bandwidth, pdu).
The attribute
element specifies accept or reject
or a particular attribute type, value pair.
Well-behaving targets will reject unsupported attributes on their
own. This feature is useful for targets that do not gracefully
handle unsupported attributes.
Attribute elements may be repeated. The proxy inspects the attribute specifications in the order as specified in the configuration file. When a given attribute specification matches a given attribute list in a query, the proxy takes appropriate action (reject, accept).
If no attribute specifications matches the attribute list in a query, it is accepted.
The attribute
element has two required attributes:
type
which is the Attribute Type-1 type, and
value
which is the Attribute Type-1 value.
The special value/type *
matches any attribute
type/value. A value may also be specified as a list with each
value separated by comma, a value may also be specified as a
list: low value - dash - high value.
If attribute error
is given, that holds a
Bib-1 diagnostic which is sent to the client if the particular
type, value is part of a query.
If attribute error
is not given, the attribute
type, value is accepted and passed to the backend target.
A target that supports use attributes 1,4, 1000 through 1003 and no other use attributes, could use the following rules:
<attribute type="1" value="1,4,1000-1003"/> <attribute type="1" value="*" error="114"/>
The syntax
element specifies accept or reject
or a particular record syntax request from the client. It also
allows record conversion of XML records via XSLT.
The syntax
has one required attribute:
type
which is the Preferred Record Syntax.
If attribute error
is given, that holds a
Bib-1 diagnostic which is sent to the client if the particular
record syntax is part of a present - or search request.
If attribute error
is not given, the record syntax
is accepted and passed to the backend target.
If attribute marcxml
is given, the proxy will
perform MARC21 to MARCXML conversion. In this case the
type
should be XML. The proxy will use
preferred record syntax USMARC/MARC21 or backendtype
(if given) against the backend target.
For the special case where backendtype
is
opac
the proxy will convert the OPAC
record to OPACXML.
When marcxml
is used, yazproxy assumes
that records retrieved from the backend are encoded in the
MARC-8 character set.
This is correct for most MARC21 based systems, but not for
other MARC variants or UTF-8 based MARC21 systems.
The backendcharset
attribute specifies
the character set of the MARC records to be converted.
If attribute backendtype
is given, that holds the
record syntax to be transmitted to backend.
If attribute backendelementset
is given, that holds
elementset to be transmitted to backend. An empty value of
backendelementset
has the effect of omitting
any Comp-Spec (and elementset) sent to backend.
If backendelementset
is omitted, the element
set from client is used, except if marcxml
is used.
In that case (using marcxml
), no Comp-Spec and no
elementset is sent to backend.
If attribute stylesheet
is given, the proxy
will convert XML record from server via XSLT. It is important
that the content from server is XML. If used in conjunction with
attribute marcxml
, the MARC to MARCXML/OPACXML
conversion takes place before the XSLT conversion takes place.
If attribute identifier
is given that is the
SRU record schema identifier for the resulting output record (after
MARCXML and/or XSLT conversion).
If sub element title
is given (as child element
of syntax
, then that is the official SRU
name of the resulting record schema.
If sub element name
is given that is an alias
for the record schema identifier. Multiple name
s
may be specified.
Example 4.1. MARCXML conversion
To accept USMARC and offer MARCXML XML plus Dublin Core (via XSLT conversion) but the following configuration could be used:
<proxy> <target name="mytarget"> .. <syntax type="usmarc"/> <syntax type="xml" marcxml="1" identifier="info:srw/schema/1/marcxml-v1.1" <title>MARCXML<title> <name>marcxml<name> </syntax> <syntax type="xml" marcxml="1" stylesheet="MARC21slim2SRWDC.xsl" identifier="info:srw/schema/1/dc-v1.1"> <title>Dublin Core<title> <name>dc<name> </syntax> <syntax type="*" error="238"/> .. </target> </proxy>
The explain
element includes Explain information
for SRU about the server in the target section. This
information must have a serverInfo
element
with a database that this target must be available as (URL path).
For example,
<explain xmlns="http://explain.z3950.org/dtd/2.0/"> <serverInfo> <host>myhost.org</host> <port>8000</port> <database>mydatabase</database> </serverInfo> <!-- remaining Explain stuff --> </explain>
In the above case, the SRU service is available as
http://myhost.org:8000/mydatabase
.
The content of the cql2rpn
element specifies
the path from the working directory to a CQL-to-RPN conversion
file for the server in the target section. This element
is required for SRU searches to operate against Z39.50
servers that don't support CQL. Most Z39.50 servers only support
Type-1/RPN so this is usually required.
See YAZ documentation for more information about the
CQL to PQF conversion.
See also the
pqf.properties
in the etc
(or prefix/share/yazproxy
)
directory of the YAZ proxy distribution.
The element preinit
is the child of element
target
and specifies the number of spare
connection to a target. By default no spare connection are
created by the proxy. If the proxy uses a target exclusive or
a lot, the preinit session will ensure that target sessions
have been made before the client makes a connection and will therefore
reduce the connect-init handshake dramatically. Never set this to
more than 5.
The element target-authentication
specifies
fixed authentication information to be sent to the backend target.
This element takes a an attribute type
which is
the authenticatin type to be used..
none
No authentication. There is no CDATA associated with this.
anonymous
Anonymous authentication. There is no CDATA associated with this.
open
Open authentication. The CDATA consists of the open authentication string.
idPass
IdPass authentication. The CDATA consists of three terms: user, group and password.
The element target-charset
specifies the
native character set that the target uses for queries.
If this is specified the proxy will act as a Z39.50 server supporting character set negotiation. And in SRU mode it will convert from UTF-8 (UNICODE) to this native character set (if possible).
The element max-clients
is the child of element
proxy
and specifies the total number of
allowed connections to targets (all targets). If this limit
is reached the proxy will close the least recently used connection.
Note, that many Unix systems impose a system on the number of open files allowed in a single process, typically in the range 256 (Solaris) to 1024 (Linux). The proxy uses 2 sockets per session + a few files for logging. As a rule of thumb, ensure that 2*max-clients + 5 can be opened by the proxy process.
Using the bash shell, you can set
the limit with
ulimit -n
no
.
Use ulimit -a
to display limits.
The element log
is the child of element
proxy
and specifies what to be logged by the
proxy.
Specify the log file with command-line option -l
.
The text of the log
element is a sequence of
options separated by white space. See the table below:
Table 4.1. Logging options
Option | Description |
---|---|
client-apdu |
Log APDUs as reported by YAZ for the
communication between the client and the proxy.
This facility is equivalent to the APDU logging that
happens when using option -a , however
this tells the proxy to log in the same file as given
by -l .
|
server-apdu | Log APDUs as reported by YAZ for the communication between the proxy and the server (backend). |
clients-requests | Log a brief description about requests transferred between the client and the proxy. The name of the request and the size of the APDU is logged. |
server-requests | Log a brief description about requests transferred between the proxy and the server (backend). The name of the request and the size of the APDU is logged. |
client-ip | Log the client IP for each log entry. By default, the client IP is only logged when a new session starts. |
To log communication in details between the proxy and the backend, th following configuration could be used:
<target name="mytarget"> <log>server-apdu server-requests</log> </target>
The element max-connect
is a child of element
proxy
and specifies the maximum number
of connections to be initiated within the last minute (or
value of period-connect.
If the maximum number is reached the proxy will terminate the just initiated session (connection terminated).
The element max-connect
is a child of element
proxy
and specifies the limit of number
of connections to be initiated within the last minute (or
value of period-connect.
If the maximum number is reached the proxy delays the first operation in the session by one second.
The element period-connect
is a child of element
proxy
and specifies period - in the number of seconds
that limit-connect and
max-connect
should measure connections.
If period-connect
is omitted, 60 seconds is used.
The element docpath
is a child of element
proxy
and specifies an allowed HTTP path
for local file access. Using docpath
the
proxy may return static file content.
The value of docpath both serves as a HTTP path prefix
and as a local file prefix.
If a value of etc
is used only URLs with the
prefix /etc/
results in a local file access to the
directory etc
within the working directory
of yazproxy.
Care has been taken to ensure that hostile URLs are rejected - including
strings such as ..
and /
(absolute
file system access).