Name

sparql — Metaproxy Module for accessing a triplestore

DESCRIPTION

This module translates Z39.50 operations (init, search, present) to HTTP requests that access a remote triplestore via HTTP.

This module only inspects Z39.50, while HTTP requests are ignored (passed through). When this module is in effect, the result is HTTP packages. Use the http_client module after this module in the route, in order to contact a remote triplestore via HTTP.

Configuration consists of an optional defaults section and one or more database sections.

The default sections is defined with element defaults and specifies the URL of the triplestore by attribute uri.

A database section is defined with element db. The db element must specify attribute path which is the name of the Z39.50 database. It should also include attribute uri with the URL of the triplestore; unless already specified in the defaults section. The element-set-name / schema for the database may be given with attribute schema. A db configuration may also include settings from another db section - specified by the include attribute. Each database section takes these elements:

<prefix/>

Section that maps prefixes and namespaces for RDF vocabularies. The format is prefix, followed by colon, followed by value.

<form/>

SPARQL Query formulation selection. Should start with one of the query forms: SELECT or CONSTRUCT.

<criteria/>

Section that allows to map static graph patterns for binding variables, narrowing types, etc, or any other WHERE clause criteria static to the Z39.50/SRU database. The final query conversion logic should be able to deduce which optional criteria should be included in the generated SPARQL, by analyzing variables required in the query matching and display fields.

<index type="attribute"/>

Section used to declare RPN/Type-1 use attribute strings (indices) and map them to BIBFRAME graph patterns. Items in this section are constructed during RPN query processing and placeholders that are prefixed by a percent sign (%) are expanded. See the section called “EXPANSIONS”. To map a given use attribute (search field) into multiple entity properties, SPARQL constructs like `OPTIONAL` or `UNION` can be used.

<present type="attribute"/>

Section used to declare retrieval for a given element-set-name (SRU schema). The CDATA is SPARQL where %u holds the URI of the record. This can be used to construct the resulting record.

<modifier/>

Optional section that allows you to add solution sequences or modifiers.

EXPANSIONS

%t

The term verbatim as it appears in the Type-1 query.

%s

Like %t but quoted - for general strings.

%d

Term - expecting an integer.

%u

Like %t, but with prefix < and suffix > - for URIs.

%v

Expands to a SPARQL local variable ?v.... Allows the use of a local SPARQL variable for each Attribute+Term in the Type-1 query.

SCHEMA

# Metaproxy XML config file schema

namespace mp = "http://indexdata.com/metaproxy"

filter_sparql =
  attribute type { "sparql" },
  attribute id { xsd:NCName }?,
  attribute name { xsd:NCName }?,
  element mp:defaults {
    attribute uri { xsd:string }?
  }?,
  element mp:db {
    attribute path { xsd:string },
    attribute uri { xsd:string }?,
    attribute schema { xsd:string }?,
    attribute include { xsd:string }?,
    element mp:prefix { xsd:string }+,
    element mp:form { xsd:string }*,
    element mp:criteria { xsd:string }*,
    element mp:index {
      attribute type { xsd:string },
      xsd:string
    }*,
    element mp:present {
      attribute type { xsd:string },
      xsd:string
    }*,
    element mp:modifier { xsd:string }*
  }+

   

EXAMPLE

Configuration for database "Default" that allows searching works. Only the field (use attribute) "bf.wtitle" is supported.

  <filter type="sparql">
    <db path="Default"
        uri="http://bibframe.indexdata.com/sparql/"
        schema="sparql-results">
      <prefix>bf: http://bibframe.org/vocab/</prefix>
      <form>SELECT ?work ?wtitle</form>
      <criteria>?work a bf:Work</criteria>
      <criteria>?work bf:workTitle ?wt</criteria>
      <criteria>?wt bf:titleValue ?wtitle</criteria>
      <index type="bf.wtitle">?wt bf:titleValue %v FILTER(contains(%v, %s))</index>
    </db>
  </filter>

   

The matching is done by a simple case-sensitive substring match. There is no deduplication, so if a work has two titles, we get two rows.

EXAMPLE

A more complex configuration for database "work". This could be included in the same filter section as the "Default" db above.

    <db path="work" schema="sparql-results">
      <prefix>bf: http://bibframe.org/vocab/</prefix>
      <form>SELECT
              ?work
              (sql:GROUP_DIGEST (?wtitle, ' ; ', 1000, 1)) AS ?title
              (sql:GROUP_DIGEST (?creatorlabel, ' ; ', 1000, 1))AS ?creator
              (sql:GROUP_DIGEST (?subjectlabel, ' ; ', 1000, 1))AS ?subject
      </form>
      <criteria>?work a bf:Work</criteria>

      <criteria> OPTIONAL {
          ?work bf:workTitle ?wt .
          ?wt bf:titleValue ?wtitle }
      </criteria>
      <criteria> OPTIONAL {
          ?work bf:creator ?creator .
          ?creator bf:label ?creatorlabel }
      </criteria>
      <criteria>OPTIONAL {
          ?work bf:subject ?subject .
          ?subject bf:label ?subjectlabel }
      </criteria>
      <index type="4">?wt bf:titleValue %v FILTER(contains(%v, %s))</index>
      <index type="1003">?creator bf:label %v FILTER(contains(%v, %s))</index>
      <index type="21">?subject bf:label %v FILTER(contains(%v, %s))</index>
      <index type="1016"> {
            ?work ?op1 ?child .
            ?child ?op2 %v FILTER(contains(STR(%v), %s))
          }
      </index>
      <modifier>GROUP BY $work</modifier>
    </db>

   

This returns one row for each work. Titles, authors, and subjects are all optional. If they repeat, the repeated values are concatenated into a single field, separated by semicolons. This is done by the GROUP_DIGEST function that is specific to the Virtuoso back end.

This example supports use attributes 4 (title), 1003 (author), 21 (subject), and 1016 (keyword) which matches any literal in a triplet that refers to the work, so it works for the titleValue in the workTitle, as well as the label in the subject, and what ever else there may be. Like the preceding example, the matching is by a simple substring, case sensitive. A more realistic term matching could be done with regular expressions, at the cost of some readability portability, and performance.

EXAMPLE

Configuration for database "works". This uses CONSTRUCT to produce rdf.

    <db path="works" schema="rdf">
      <prefix>bf: http://bibframe.org/vocab/</prefix>
      <form>CONSTRUCT {
          ?work bf:title ?wtitle .
          ?work bf:instanceTitle ?title .
          ?work bf:author ?creator .
          ?work bf:subject ?subjectlabel }
      </form>
      <criteria>?work a bf:Work</criteria>

      <criteria>?work bf:workTitle ?wt</criteria>
      <criteria>?wt bf:titleValue ?wtitle</criteria>
      <index type="4">?wt bf:titleValue %v FILTER(contains(%v, %s))</index>
      <criteria>?work bf:creator ?creator</criteria>
      <criteria>?creator bf:label ?creatorlabel</criteria>
      <index type="1003">?creator bf:label %v FILTER(contains(%v, %s))</index>
      <criteria>?work bf:subject ?subject</criteria>
      <criteria>?subject bf:label ?subjectlabel</criteria>
      <index type="21">?subject bf:label %v FILTER(contains(%v, %s))</index>
    </db>
 
   

EXAMPLE

Configuration for database "instance". Like "work" above this uses SELECT to return row-based data, this time from the instances. This is not deduplicated, so if an instance has two titles, we get two rows, and if it also has two formats, we get four rows. The DISTINCT in the SELECT

    <db path="instance" schema="sparql-results">
      <prefix>bf: http://bibframe.org/vocab/</prefix>
      <form>SELECT DISTINCT ?instance ?title ?format</form>
      <criteria>?instance a bf:Instance</criteria>
      <criteria>?instance bf:title ?title</criteria>
      <index type="4">?instance bf:title %v FILTER(contains(%v, %s))</index>
      <criteria>?instance bf:format ?format</criteria>
      <index type="1013">?instance bf:format %s</index>
    </db>
 
   

SEE ALSO

metaproxy(1)