zoom — Metaproxy ZOOM Module
This filter implements a generic client based on ZOOM of YAZ. The client implements the protocols that ZOOM C does: Z39.50, SRU (GET, POST, SOAP) and Solr .
This filter only deals with Z39.50 on input. The following services are supported: init, search, present and close. The backend target is selected based on the database as part of search and not as part of init.
This filter is an alternative to the z3950_client filter but also shares properties of the virt_db - in that the target is selected for a specific database.
The ZOOM filter relies on a target profile description, which is
XML based. It picks the profile for a given database from a web service,
or it may be locally given for each unique database (AKA virtual database
in virt_db). Target profiles are directly and indirectly given as part
of the torus
element in the configuration.
The configuration consists of six parts: torus
,
fieldmap
, cclmap
,
contentProxy
, log
and zoom
.
The torus
element specifies target profiles
and takes the following content:
url
URL of Web service to be used to fetch target profiles from a remote service (Torus normally).
The sequence %query
is replaced with a CQL
query for the Torus search.
The special sequence %realm
is replaced by the value
of attribute realm
or by the realm DATABASE argument.
The special sequence %db
is replaced with
a single database while searching. Note that this sequence
is no longer needed, because the %query
can already
query for a single database by using CQL query
udb==...
.
content_url
URL of Web service to be used to fetch target profile
for a given database (udb) of type content. Semantics are otherwise like
url
attribute above.
auth_url
URL of Web service to be used for auth/IP lookup. If this is defined, all access is granted or denied as part of Z39.50 Init by the ZOOM module, and the use of database parameters realm and torus_url is not allowed. If this setting is not defined, all access is allowed and realm and/or torus_url may be used.
auth_hostname
Limits IP lookup to a given logical hostname.
realm
The default realm value. Used for %realm in URL, unless specified in DATABASE parameter.
proxy
HTTP proxy to be used for fetching target profiles.
xsldir
Directory that is searched for XSL stylesheets. Stylesheets
are specified in the target profile by the
transform
element.
element_transform
Specifies the element that triggers retrieval and transform using the parameters elementSet, recordEncoding, requestSyntax, transform from the target profile. Default value is "pz2", due to the fact that for historical reasons the common format is that used in Pazpar2.
element_raw
Specifies an element that triggers retrieval using the parameters elementSet, recordEncoding, requestSyntax from the target profile. Same actions as for element_transform, but without the XSL transform. Useful for debugging. The default value is "raw".
explain_xsl
Specifies a stylesheet that converts one or more Torus records to ZeeRex Explain records. The content of recordData is assumed to be holding each Explain record.
record_xsl
Specifies a stylesheet that converts retrieval records after transform/literal operations.
When Metaproxy creates a content proxy session, the XSL parameter
cproxyhost
is passed to the transform.
records
Local target profiles. This element may include zero or
more record
elements (one per target
profile). See section TARGET PROFILE.
The fieldmap
may be specified zero or more times. It
specifies the map from CQL fields to CCL fields, and takes the
following content:
cql
CQL field that we are mapping "from".
ccl
CCL field that we are mapping "to".
The third part of the configuration consists of zero or more
cclmap
elements that specify the
base CCL profile to be used for all targets.
This configuration, thus, will be combined with cclmap-definitions
from the target profile.
The contentProxy
element controls content proxying.
This section
is optional and must only be defined if content proxying is enabled.
config_file
Specifies the file that configures the cf-proxy system. Metaproxy
uses setting sessiondir
and
proxyhostname
from that file to configure
name of proxy host and directory of parameter files for the cf-proxy.
server
Specifies the content proxy host. The host is of the form
host[:port]. That is without a method (such as http://
).
The port number is optional.
This setting is deprecated. Use the config_file (above) to inform about the proxy server.
tmp_file
Specifies the filename of a session file for content proxying. The
file should be an absolute filename that includes
XXXXXX
which is replaced by a unique filename
using the mkstemp(3) system call. The default value of this
setting is /tmp/cf.XXXXXX.p
.
This setting is deprecated. Use the config_file (above) to inform about the session file area.
The log
element controls logging for the
ZOOM filter.
apdu
If the value of apdu is "true", then protocol packages (APDUs and HTTP packages) from the ZOOM filter will be logged to the yaz_log system. A value of "false" will not perform logging of protocol packages (the default behavior).
The zoom
element controls settings for the
ZOOM.
timeout
Is an integer that specifies, in seconds, how long an operation may take before ZOOM gives up. Default value is 40.
proxy_timeout
Is an integer that specifies, in seconds, how long an operation a proxy check will wait before giving up. Default value is 1.
The ZOOM filter accepts three query types: RPN(Type-1), CCL and CQL.
Queries are converted in two separate steps. In the first step the input query is converted to RPN/Type-1. This is always the common internal format between step 1 and step 2. In step 2 the query is converted to the native query type of the target.
Step 1: for RPN, the query is passed un-modified to the target.
Step 1: for CCL, the query is converted to RPN via
cclmap
elements part of
the target profile as well as
base CCL maps.
Step 1: For CQL, the query is converted to CCL. The mappings of
CQL fields to CCL fields are handled by
fieldmap
elements as part of the target profile. The resulting query, CCL,
is then converted to RPN using the schema mentioned earlier (via
cclmap
).
Step 2: If the target is Z39.50-based, it is passed verbatim (RPN). If the target is SRU-based, the RPN will be converted to CQL. If the target is Solr-based, the RPN will be converted to Solr's query type.
The ZOOM module actively handles CQL sorting - using the SORTBY parameter
which was introduced in SRU version 1.2. The conversion from SORTBY clause
to native sort for some target, is driven by the two parameters:
sortStrategy
and
sortmap_
field
.
If a sort field that does not have an equivalent
sortmap_
-mapping, it is passed un-modified through the
conversion. It doesn't throw a diagnostic.
The ZOOM module is driven by a number of settings that specify how to handle each target. Note that unknown elements are silently ignored.
The elements, in alphabetical order, are:
Authentication parameters to be sent to the target. For Z39.50 targets, this will be sent as part of the Init Request. Authentication consists of two components: username and password, separated by a slash.
If this value is omitted or empty, no authentication information is sent.
Specifies how authentication parameters are passed to server
for SRU. Possible values are: url
and basic
. For the url mode username and password
are carried in URL arguments x-username and x-password.
For the basic mode, HTTP basic authentication is used.
The settings only take effect
if authentication
is set.
If this value is omitted, HTTP basic authentication is used.
field
This value specifies the CCL field (qualifier) definition for some field. For Z39.50 targets this most likely will specify the mapping to a numeric use attribute + a structure attribute. For SRU targets, the use attribute should be string based, in order to make the RPN to CQL conversion work properly (step 2).
When cfAuth is defined, its value will be used as authentication to the backend target, and the authentication setting will be specified as part of a database. This is like a "proxy" for authentication and is used for Connector Framework based targets.
Specifies HTTP proxy for the target in the form
host
:port
.
Specifies sub database for a Connector Framework based target.
Specifies authentication info to be passed to a content connector. This is only used if content-user and content-password are omitted.
Specifies a database for content-based proxying.
Specifies the elementSet to be sent to the target if record
transform is enabled (not to be confused with the record_transform
module). The record transform is enabled only if the client uses
record syntax = XML and an element set determined by
the element_transform
/
element_raw
from the configuration.
By default that is the element sets pz2
and raw
.
If record transform is not enabled, this setting is
not used and the element set specified by the client
is passed verbatim.
Specifies an XSL stylesheet to be used if record
transform is enabled; see description of elementSet.
The XSL transform is only used if the element set is set to the
value of element_transform
in the configuration.
The value of literalTransform is the XSL - string encoded.
A value of 1/true is a hint to the ZOOM module that this Z39.50 target supports piggyback searches, i.e. Search Response with records. Any other value (false) will prevent the ZOOM module to make use of piggyback (all records part of Present Response).
If this value is defined, all queries will be converted to this encoding. This should be used for all Z39.50 targets that do not use UTF-8 for query terms.
Specifies the character encoding of records that are returned by the target. This is primarily used for targets were records are not UTF-8 encoded already. This setting is only used if the record transform is enabled (see description of elementSet).
Specifies the record syntax to be specified for the target if record transform is enabled; see description of elementSet. If record transform is not enabled, the record syntax of the client is passed verbatim to the target.
field
This value the native field for a target. The form of the value is given by sortStrategy.
Specifies sort strategy for a target. One of:
z3950
, type7
,
cql
, sru11
or
embed
. The embed
chooses type-7
or CQL sortby, depending on whether Type-1 or CQL is
actually sent to the target.
If this setting is set, it specifies that the target is web service
based and must be one of : get
,
post
, soap
or solr
.
Specifies the SRU version to use. It unset, version 1.2 will be used. Some servers do not support this version, in which case version 1.1 or even 1.0 could be set.
Specifies an XSL stylesheet filename to be used if record
transform is enabled; see description of elementSet.
The XSL transform is only used if the element set is set to the
value of element_transform
in the configuration.
This value is required and specifies the unique database for this profile. All target profiles should hold a unique database.
The value of this field is a string that generates a dynamic link
based on record content. If the resulting string is non-zero in length
a new field, metadata
with attribute
type="generated-url"
is generated.
The contents of this field is the result of the URL recipe conversion.
The urlRecipe value may refer to an existing metadata element by
${field[pattern/result/flags]}, which will take the content
of the field, and perform a regular expression conversion using the pattern
given. For example: ${md-title[\s+/+/g]}
takes
metadata element title
and converts one or more
spaces to a plus character.
This setting is mandatory. It specifies the ZURL of the target in the form of host/database. The HTTP method should not be provided as this is guessed from the "sru" attribute value.
Extra information may be carried in the Z39.50 Database or SRU path, such as authentication to be passed to backend etc. Some of the parameters override TARGET profile values. The format is:
udb,parm1=value1&parm2=value2&...
Where udb is the unique database recognised by the backend. The parm1, value1, .. are parameters to be passed. The following describes the supported parameters. Like form values in HTTP, the parameters and values are URL encoded. The separator, though, between udb and parameters is a comma rather than a question mark. What follows the question mark are HTTP arguments (in this case SRU arguments).
The database parameters, in alphabetical order, are:
The password to be used for content proxy session. If this parameter
is not given, value of parameter password
is passed
to content proxy session.
Specifies proxy to be used for content proxy session. If this parameter
is not given, value of parameter proxy
is passed
to content proxy session.
The user to be used for content proxy session. If this parameter
is not given, value of parameter user
is passed
to content proxy session.
Specifies the session ID for content proxy. This parameter is, generally, not used by anything but the content proxy itself when invoking Metaproxy via SRU.
If this parameter is specified, content-proxying is disabled for the search.
Specifies password to be passed to backend. It is also passed
to content proxy session, unless overridden by content-password.
If this parameter is omitted, the password will be taken from
TARGET profile setting
authentication
.
Specifies one or more proxies for backend. If this parameter is
omitted, the proxy will be taken from TARGET profile setting
cfProxy
.
The parameter is a list of comma-separated host:port entries.
Both host and port must be given for each proxy.
Session realm to be used for this target, changed the resulting URL to be used for getting a target profile, by changing the value that gets substituted for the %realm string. This parameter is not allowed if access is controlled by auth_url in configuration.
Optional parameter. If the value is 0, retry on failure is
disabled for the ZOOM module. Any other value enables retry
on failure. If this parameter is omitted, then the value of
retryOnFailure
from the Torus record is used (same values).
Sets the URL to be used for Torus records to be fetched - overriding value
of url
attribute of element torus
in zoom configuration. This parameter is not allowed if access is
controlled by
auth_url in configuration.
Specifies user to be passed to backend. It is also passed
to content proxy session unless overridden by content-user.
If this parameter is omitted, the user will be taken from TARGET
profile setting
authentication
.
All parameters that have prefix "x-
" are passed verbatim
to the backend.
# Metaproxy XML config file schemas
#
# Copyright (C) Index Data
# See the LICENSE file for details.
namespace mp = "http://indexdata.com/metaproxy"
filter_zoom =
attribute type { "zoom" },
attribute id { xsd:NCName }?,
attribute name { xsd:NCName }?,
element mp:torus {
attribute allow_ip { xsd:string }?,
attribute auth_url { xsd:string }?,
attribute url { xsd:string }?,
attribute content_url { xsd:string }?,
attribute realm { xsd:string }?,
attribute xsldir { xsd:string }?,
attribute element_transform { xsd:string }?,
attribute element_raw { xsd:string }?,
attribute element_passthru { xsd:string }?,
attribute proxy { xsd:string }?,
attribute explain_xsl { xsd:string }?,
attribute record_xsl { xsd:string }?,
element mp:records {
element mp:record {
element mp:authentication { xsd:string }?,
element mp:authenticationMode { xsd:string }?,
element mp:piggyback { xsd:string }?,
element mp:queryEncoding { xsd:string }?,
element mp:udb { xsd:string },
element mp:cclmap_au { xsd:string }?,
element mp:cclmap_date { xsd:string }?,
element mp:cclmap_isbn { xsd:string }?,
element mp:cclmap_su { xsd:string }?,
element mp:cclmap_term { xsd:string }?,
element mp:cclmap_ti { xsd:string }?,
element mp:contentAuthentication { xsd:string }?,
element mp:elementSet { xsd:string }?,
element mp:recordEncoding { xsd:string }?,
element mp:requestSyntax { xsd:string }?,
element mp:sru { xsd:string }?,
element mp:sruVersion { xsd:string }?,
element mp:transform { xsd:string }?,
element mp:literalTransform { xsd:string }?,
element mp:urlRecipe { xsd:string }?,
element mp:zurl { xsd:string },
element mp:cfAuth { xsd:string }?,
element mp:cfProxy { xsd:string }?,
element mp:cfSubDB { xsd:string }?,
element mp:contentConnector { xsd:string }?,
element mp:sortStrategy { xsd:string }?,
element mp:sortmap_author { xsd:string }?,
element mp:sortmap_date { xsd:string }?,
element mp:sortmap_title { xsd:string }?,
element mp:extraArgs { xsd:string }?,
element mp:rpn2cql { xsd:string }?,
element mp:retryOnFailure { xsd:string }?
}*
}?
}?,
element mp:fieldmap {
attribute cql { xsd:string },
attribute ccl { xsd:string }?
}*,
element mp:cclmap {
element mp:qual {
attribute name { xsd:string },
element mp:attr {
attribute type { xsd:string },
attribute value { xsd:string }
}+
}*
}?,
element mp:contentProxy {
attribute config_file { xsd:string }?,
attribute server { xsd:string }?,
attribute tmp_file { xsd:string }?
}?,
element mp:log {
attribute apdu { xsd:boolean }?
}?,
element mp:zoom {
attribute timeout { xsd:integer }?,
attribute proxy_timeout { xsd:integer }?
}?
In example below, Target definitions (Torus records) are fetched from a web service via a proxy. A CQL profile is configured which maps to a set of CCL fields ("no field", au, tu and su). Presumably the target definitions fetched, will map the CCL to their native RPN. A CCL "ocn" is mapped for all targets. Logging of APDUs are enabled, and a timeout is given.
<filter type="zoom"> <torus url="http://torus.indexdata.com/src/records/?query=%query" proxy="localhost:3128" /> <fieldmap cql="cql.anywhere"/> <fieldmap cql="cql.serverChoice"/> <fieldmap cql="dc.creator" ccl="au"/> <fieldmap cql="dc.title" ccl="ti"/> <fieldmap cql="dc.subject" ccl="su"/> <cclmap> <qual name="ocn"> <attr type="u" value="12"/> <attr type="s" value="107"/> </qual> </cclmap> <log apdu="true"/> <zoom timeout="40"/> </filter>
Here is another example with two locally defined targets: A Solr target and a Z39.50 target.
<filter type="zoom"> <torus> <records> <record> <udb>ocs-test</udb> <cclmap_term>t=z</cclmap_term> <cclmap_ti>u=title t=z</cclmap_ti> <sru>solr</sru> <zurl>ocs-test.indexdata.com/solr/select</zurl> </record> <record> <udb>loc</udb> <cclmap_term>t=l,r</cclmap_term> <cclmap_ti>u=4 t=l,r</cclmap_ti> <zurl>lx2.loc.gov:210/LCDB_MARC8</zurl> </record> </records> </torus> <fieldmap cql="cql.serverChoice"/> <fieldmap cql="dc.title" ccl="ti"/> </filter>