YAZ version 2.1.20 or later includes a Retrieval facility tool which allows a SRU/Z39.50 to describe itself and perform record conversions. The idea is the following:
An SRU/Z39.50 client sends a retrieval request which includes a combination of the following parameters: syntax (format), schema (or element set name).
The retrieval facility is invoked with parameters in a server/proxy. The retrieval facility matches the parameters a set of "supported" retrieval types. If there is no match, the retrieval signals an error (syntax and / or schema not supported).
For a successful match, the backend is invoked with the same or altered retrieval parameters (syntax, schema). If a record is received from the backend, it is converted to the frontend name / syntax.
The resulting record is sent back the client and tagged with the frontend syntax / schema.
The Retrieval facility is driven by an XML configuration. The configuration is neither Z39.50 ZeeRex or SRU ZeeRex. But it should be easy to generate both of them from the XML configuration. (Unfortunately the two versions of ZeeRex differ substantially in this regard.)
All elements should be covered by namespace
http://indexdata.com/yaz
.
The root element node must be retrievalinfo
.
The retrievalinfo
must include one or
more retrieval
elements. Each
retrieval
defines specific combination of
syntax, name and identifier supported by this retrieval service.
The retrieval
element may include any of the
following attributes:
syntax
(REQUIRED)Defines the record syntax. Possible values is any of the names defined in YAZ' OID database or a raw OID in (n.n ... n).
name
(OPTIONAL)
Defines the name of the retrieval format. This can be
any string. For SRU, the value is equivalent to schema (short-hand);
for Z39.50 it's equivalent to simple element set name.
For YAZ 3.0.24 and later this name may be specified as a glob
expression with operators
*
and ?
.
identifier
(OPTIONAL)Defines the URI schema name of the retrieval format. This can be any string. For SRU, the value is equivalent to URI schema. For Z39.50, there is no equivalent.
The retrieval
may include one
backend
element. If a backend
element is given, it specifies how the records are retrieved by
some backend and how the records are converted from the backend to
the "frontend".
The attributes, name
and syntax
may be specified for the backend
element. The
semantics of these attributes is equivalent to those for the
retrieval
. However, these values are passed to
the "backend".
The backend
element may include one or more
conversion instructions (as children elements). The supported
conversions are:
marc
The marc
element specifies a conversion
to - and from ISO2709 encoded MARC and
MARCXML/MarcXchange.
The following attributes may be specified:
inputformat
(REQUIRED)
Format of input. Supported values are
marc
(for ISO2709), xml
(MARCXML/MarcXchange) and json
(MARC-in-JSON).
outputformat
(REQUIRED)
Format of output. Supported values are
line
(MARC line format);
marcxml
(for MARCXML),
marc
(ISO2709),
turbomarc
,
marcxchange
(for MarcXchange),
or json
(MARC-in-JSON ).
inputcharset
(OPTIONAL)
Encoding of input. For XML input formats, this need not
be given, but for ISO2709 based input formats, this should
be set to the encoding used. For MARC21 records, a common
inputcharset value would be marc-8
.
If inputformat is marc
and inputcharset
is marc-8
, then effective inputcharset is
UTF-8 if leader position has value 'a' (MARC21 rule).
outputcharset
(OPTIONAL)
Encoding of output. If outputformat is XML based, it is
strongly recommended to use utf-8
.
leaderspec
(OPTIONAL)
Specifies a modification to the leader for the resulting output
record. The leaderspec
is a comma
separated list of pos=value pairs, where pos is an integer offset
(0 - 23) for leader. Value is either a quoted string or an integer
(character value in decimal).
For example, to set leader at offset 9 to a,
use 9='a'
.
This has same effect as -l
for
yaz-marcdump(1).
select
The select
selects one or more text nodes
and decodes them as XML.
The following attributes may be specified:
path
(REQUIRED)X-Path expression for selecting text nodes.
This conversion is available in YAZ 5.8.0 and later.
solrmarc
The solrmarc
decodes solrmarc records.
It assumes that the input is pure solrmarc text (no escaping)
and will convert all sequences of the form #XX; to a single
character of the hexadecimal value as given by XX. The output,
presumably, is a valid ISO2709 buffer.
This conversion is available in YAZ 5.0.21 and later.
xslt
The xslt
element specifies a conversion
via XSLT. The following attributes may be specified:
stylesheet
(REQUIRED)Stylesheet file.
In addition, the element can be configured as follows:
param
(OPTIONAL)
A param
tag configures a parameter to be passed
to the XSLT stylesheet. Multiple param
tags may be defined.
rdf-lookup
The rdf-lookup
element looks up BIBFRAME elements
in some suitable service, for example http://id.loc.gov/authorities/names
and replaces the URIs for specified elements with URIs it finds at that
service. Its configuration consists of
debug
(OPTIONAL)
Attribute to the rdf-lookup
tag to enable debug
output. A value of "1" makes the filter to add a XML comment
next to each key it tried to look up, showing the URL, the result,
and timing. This is useful for debugging the configuration. The
default is not to add any comments.
timeout
(OPTIONAL)
Attribute of the rdf-lookup
tag which
defines timeout in seconds for the HTTP based rdf-lookup.
namespace
(OPTIONAL)
A namespace
tag declares a namespace to be
used in the xpath
below. The tag requires
two attributes: prefix
and
href
.
lookup
(REQUIRED)
A section that defines one tag to be looked up, for example
an author.The
xpath
attribute (REQUIRED) specifies the path to
the element(s).
key
(REQUIRED)
A tag withing the lookup
tag specifies the
value to be used in the lookup, for example a name or an ID. It
is a relative Xpath starting from the tag specified in the
lookup
.
server
(OPTIONAL)
Specifies the URL for server to use for the lookup.
A %s
is replaced by the key value to be looked
up. If not specified, defaults to the same as the previous
lookup
section, or lacking one, to
http://id.loc.gov/authorities/names/label/%s .
The method
attribute can be used to specify
the HTTP method to be used in this lookup. The default is GET,
and the useful alternative is HEAD.
See the example below.
This conversion is available in YAZ 5.19.0 and later.
Example 7.19. MARC21 backend
A typical way to use the retrieval facility is to enable XML for servers that only supports ISO2709 encoded MARC21 records.
<retrievalinfo> <retrieval syntax="usmarc" name="F"/> <retrieval syntax="usmarc" name="B"/> <retrieval syntax="xml" name="marcxml" identifier="info:srw/schema/1/marcxml-v1.1"> <backend syntax="usmarc" name="F"> <marc inputformat="marc" outputformat="marcxml" inputcharset="marc-8"/> </backend> </retrieval> <retrieval syntax="xml" name="dc"> <backend syntax="usmarc" name="F"> <marc inputformat="marc" outputformat="marcxml" inputcharset="marc-8"/> <xslt stylesheet="MARC21slim2DC.xsl"/> </backend> </retrieval> </retrievalinfo>
This means that our frontend supports:
MARC21 F(ull) records.
MARC21 B(rief) records.
MARCXML records.
Dublin core records.
Example 7.20. MARCXML backend
SRW/SRU and Solr backends return records in XML. If they return MARCXML or MarcXchange, the retrieval module can convert those into ISO2709 formats, most commonly USMARC (AKA MARC21). In this example, the backend returns MARCXML for schema="marcxml".
<retrievalinfo> <retrieval syntax="usmarc"> <backend syntax="xml" name="marcxml"> <marc inputformat="xml" outputformat="marc" outputcharset="marc-8"/> </backend> </retrieval> <retrieval syntax="xml" name="marcxml" identifier="info:srw/schema/1/marcxml-v1.1"/> <retrieval syntax="xml" name="dc"> <backend syntax="xml" name="marcxml"> <xslt stylesheet="MARC21slim2DC.xsl"/> </backend> </retrieval> </retrievalinfo>
This means that our frontend supports:
MARC21 records (any element set name) in MARC-8 encoding.
MARCXML records for element-set=marcxml
Dublin core records for element-set=dc.
Example 7.21. RDF-lookup backend
This is a minimal example of the backend
configuration
for the rdf-lookup. It could well be used with some heavy xslt transforms
that make BIBFRAME records out of MarxXml.
<backend syntax="xml" name="rdf-lookup"> <rdf-lookup debug="1" timeout="10"> <namespace prefix="bf" href="http://id.loc.gov/ontologies/bibframe/" /> <namespace prefix="bflc" href="http://id.loc.gov/ontologies/bibframe/lc-extensions/"/> <lookup xpath="//bf:contribution/bf:Contribution/bf:agent/bf:Agent"> <key field="bflc:name00MatchKey"/> <key field="bflc:name01MatchKey"/> <key field="bflc:name11MatchKey"/> <server url="http://id.loc.gov/authorities/names/label/%s" method="HEAD"/> </lookup> </rdf-lookup> </backend>
The debug=1
attribute tells the filter to add XML
comments to the key nodes that indicate what lookup it tried to do,
how it went, and how long it took.
The namespace prefix bf:
is defined in the
namespace
tags. These namespaces are used in the
xpath expressions in the lookup sections.
The lookup
tag specifies one tag to be looked up.
The xpath
attribute defines which node to modify. It
may make use of the namespace
definitions above.
The server
tag gives the URL to be used for the lookup.
A %s
in the string will get replaced by the key value.
If there is no server
tag, the one from the preceding
lookup
section is used, and if there is no previous
section, the id.loc.gov address is used as a default. The default is to
make a GET request, this example uses HEAD