YAZ provides a fast utility for working with MARC records. Early versions of the MARC utility only allowed decoding of ISO2709. Today the utility may both encode - and decode to a variety of formats.
#include <yaz/marcdisp.h> /* create handler */ yaz_marc_t yaz_marc_create(void); /* destroy */ void yaz_marc_destroy(yaz_marc_t mt); /* set XML mode YAZ_MARC_LINE, YAZ_MARC_SIMPLEXML, ... */ void yaz_marc_xml(yaz_marc_t mt, int xmlmode); #define YAZ_MARC_LINE 0 #define YAZ_MARC_SIMPLEXML 1 #define YAZ_MARC_OAIMARC 2 #define YAZ_MARC_MARCXML 3 #define YAZ_MARC_ISO2709 4 #define YAZ_MARC_XCHANGE 5 #define YAZ_MARC_CHECK 6 #define YAZ_MARC_TURBOMARC 7 #define YAZ_MARC_JSON 8 /* supply iconv handle for character set conversion .. */ void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd); /* set debug level, 0=none, 1=more, 2=even more, .. */ void yaz_marc_debug(yaz_marc_t mt, int level); /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure. On success, result in *result with size *rsize. */ int yaz_marc_decode_buf(yaz_marc_t mt, const char *buf, int bsize, const char **result, size_t *rsize); /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure. On success, result in WRBUF */ int yaz_marc_decode_wrbuf(yaz_marc_t mt, const char *buf, int bsize, WRBUF wrbuf);
The synopsis is just a basic subset of all functionality. Refer
to the actual header file marcdisp.h
for
details.
A MARC conversion handle must be created by using
yaz_marc_create
and destroyed
by calling yaz_marc_destroy
.
All other functions operate on a yaz_marc_t
handle.
The output is specified by a call to yaz_marc_xml
.
The xmlmode
must be one of
A simple line-by-line format suitable for display but not recommended for further (machine) processing.
ISO2709 (sometimes just referred to as "MARC").
Pseudo format for validation only. Does not generate any real output except diagnostics.
XML format with same semantics as MARCXML but more compact and geared towards fast processing with XSLT. Refer to Section 5.1, “TurboMARC” for more information.
MARC-in-JSON format.
The actual conversion functions are
yaz_marc_decode_buf
and
yaz_marc_decode_wrbuf
which decodes and encodes
a MARC record. The former function operates on simple buffers, and
stores the resulting record in a WRBUF handle (WRBUF is a simple string
type).
Example 7.18. Display of MARC record
The following program snippet illustrates how the MARC API may be used to convert a MARC record to the line-by-line format:
void print_marc(const char *marc_buf, int marc_buf_size) { char *result; /* for result buf */ size_t result_len; /* for size of result */ yaz_marc_t mt = yaz_marc_create(); yaz_marc_xml(mt, YAZ_MARC_LINE); yaz_marc_decode_buf(mt, marc_buf, marc_buf_size, &result, &result_len); fwrite(result, result_len, 1, stdout); yaz_marc_destroy(mt); /* note that result is now freed... */ }
TurboMARC is yet another XML encoding of a MARC record. The format was designed for fast processing with XSLT.
Applications like Pazpar2 uses XSLT to convert an XML encoded MARC record to an internal representation. This conversion mostly checks the tag of a MARC field to determine the basic rules in the conversion. This check is costly when that tag is encoded as an attribute in MARCXML. By having the tag value as the element instead, makes processing many times faster (at least for Libxslt).
TurboMARC is encoded as follows:
Record elements is part of namespace
"http://www.indexdata.com/turbomarc
".
A record is enclosed in element r
.
A collection of records is enclosed in element
collection
.
The leader is encoded as element l
with the
leader content as its (text) value.
A control field is encoded as element c
concatenated
with the tag value of the control field if the tag value
matches the regular expression [a-zA-Z0-9]*
.
If the tag value does not match the regular expression
[a-zA-Z0-9]*
the control field is encoded
as element c
and attribute code
will hold the tag value.
This rule ensures that in the rare cases where a tag value might
result in a non-well-formed XML, then YAZ will encode it as a coded attribute
(as in MARCXML).
The control field content is the text value of this element.
Indicators are encoded as attribute names
i1
, i2
, etc. and
corresponding values for each indicator.
A data field is encoded as element d
concatenated
with the tag value of the data field or using the attribute
code
as described in the rules for control fields.
The children of the data field element are subfield elements.
Each subfield element is encoded as s
concatenated with the sub field code.
The text of the subfield element is the contents of the subfield.
Indicators are encoded as attributes for the data field element, similar
to the encoding for control fields.