2. Your data model

Pazpar2 does not have a preconceived model of what makes up a data model. There are no assumptions that records have specific fields or that they are organized in any particular way. The only assumption is that data comes packaged in a form that the software can work with (presently, that means XML or MARC), and that you can provide the necessary information to massage it into Pazpar2's internal record abstraction.

Handling retrieval records in Pazpar2 is a two-step process. First, you decide which data elements of the source record you are interested in, and you specify any desired massaging or combining of elements using an XSLT stylesheet (MARC records are automatically normalized to MARCXML before this step). If desired, you can run multiple XSLT stylesheets in series to accomplish this, but the output of the last one must be a representation of the record in a schema that Pazpar2 understands.

The intermediate, internal representation of the record looks like this:

     <record xmlns="http://www.indexdata.com/pazpar2/1.0"
       mergekey="title The Shining author King, Stephen">

       <metadata type="title" rank="2">The Shining</metadata>

       <metadata type="author">King, Stephen</metadata>

       <metadata type="kind">ebook</metadata>
       <!-- ... and so on -->
     </record>

As you can see, there isn't much to it. There are really only a few important elements to this file.

Elements should belong to the namespace http://www.indexdata.com/pazpar2/1.0. If the root node contains the attribute 'mergekey', then every record that generates the same merge key (normalized for case differences, white space, and truncation) will be joined into a cluster. In other words, you decide how records are merged. If you don't include a merge key, records are never merged. The 'metadata' elements provide the meat of the elements -- the content. The 'type' attribute is used to match each element against processing rules that determine what happens to the data element next. The 'rank' attribute specifies specifies a multipler for ranking for this element.

The next processing step is the extraction of metadata from the intermediate representation of the record. This is governed by the 'metadata' elements in the 'service' section of the configuration file. See the section called “server” for details. The metadata in the retrieval record ultimately drives merging, sorting, ranking, the extraction of browse facets, and display, all configurable.

Pazpar2 1.6.37 and later also allows already-clustered records to be ingested. Suppose a database already clusters for us and we would like to keep that cluster for Pazpar2. In that case we can generate a cluster wrapper element that holds individual record elements.

Cluster record example:

     <cluster xmlns="http://www.indexdata.com/pazpar2/1.0">
       <record>
         <metadata type="title" rank="2">The Shining</metadata>
	 <metadata type="author">King, Stephen</metadata>
	 <metadata type="kind">ebook</metadata>
       </record>
       <record>
         <metadata type="title" rank="2">The Shining</metadata>
	 <metadata type="author">King, Stephen</metadata>
	 <metadata type="kind">audio</metadata>
       </record>
    </cluster>