The Torus abstract architecture

The Torus architecture is based on the notion of translucent record stores which we call realms and hierarchies built using them. A realm, generally speaking, provides target information (or more generally, a key-value pair record) for an entity or node in a hierarchy of organizations. A realm might be for a consortium, or a member library. Each realm consists of four components:

  • A set of zero or more parent realms from which it inherits records with key-value fields
  • A set of zero or more records inherited from all parent realms, which we jointly term as its world.
  • A notion (markers) of which of those records are selected for use, together with any field-specific overrides established for those records, called overrides
  • A set of zero or more additional records that are not inherited but are original to this realm

We describe the record stores as translucent because, while they are transparent in the sense that inherited records shine through them (to be visible in the inheriting (children) realms), they can also superimpose opaque data over the top of what is inherited, both on a field-by-field basis and by adding completely new records.

The important thing to notice here is that the output of one realm (i.e., its list of selected, overridden, and/or newly created records) can be used as the input of another (i.e., as a parent that the other realm inherits from). It's also possible to make pure producers that provide a list of records in the right format to be inherited by a realm – like centrally maintained knowledge bases, etc. Finally, it is of course possible to make pure consumers that use the records provided by a realm for purposes of their own - most obviously, for using sets of target profile records to configure a metasearch engine. This stacking is possible because the format of a realm's output is well defined – it's a simple but strictly defined XML format.

As previously noted, this architecture is not specific to the problem of target registries: although that problem was the motivating example that drove the design, the resulting system can be, and is, used for other purposes, too. For example, the MasterKey system uses realms to hold end-user identity records as well as target profiles, and these are used to support end-user authentication. Identity records include an indication (pointer) of what realm contains the profile records for the end-user's available target -- so it's possible to construct basic vertical relations within and across hierarchies. Realms have also been used to support authoritative data management: e.g target categories.

An example application

To illustrate the abstract architecture, the following example shows how a set of realms can be arranged to provide support for a consortium of several member institutions, in which some members treat individual departments differently. Low-level realms appear on the left of this diagram, and may be shared by any number of customers. Two of these (the “Home World” of profiles for well-known targets and the repository of connectors to web-based resources) are adopted as parents of the consortium's realm, which selects a subset of the targets made available by those parents and adds one of its own (the LSE library catalog). Two member institutions inherit the consortium's targets into their realms, selecting only those that they wish to include. One of those members (UCL) refines the list further for its Earth Sciences department, adding one further target.

Torus Network Diagram

In this diagram, each realm's world is listed on the left-hand side of the yellow box; the selected set is indicated on the right-hand side by ticks, along with any new targets added. Note that the “heavy lifting” of defining targets is mostly done once, in the widely user realms on the left which are shared by multiple customers. Customer-specific configuration is mostly limited to selection of targets and overriding specific aspects of inherited target definitions, such as authentication credentials.

This diagram is somewhat simplified: in reality, many customers use the target sets inherited from the generic Home World and Connector Repository realms, some of them consortia and some individual institutions. Similarly, most consortia will have many more members than shown here, and member institutions may support many more departments. The key point is that each realm may have arbitrarily many parents, and may in turn be parent to arbitrarily many children; and that this allows sufficient flexibility in the topology of the network to support any desired combination of configurations.

The Torus program

Realms resides within the Torus server program which allows full management over them via a RESTful Web Service. That is: defining and accessing parent realms, assembling parent realms records into a world (according to priority and de-duplication schemes), maintaining a set of selections from, modifications to and additions to, that world. Assembled records are exposed via a Web service endpoint with CQL search, faceting and paging capabilities that can be used to drive a management console (admin UI) like the Index Data's own mkadmin.

In the example configuration illustrated above, all of the realms except the Connector Repository are implemented by the Torus – either a single instance of the Torus software providing many realms, or multiple instances running on different machines. The distributed architecture means that locality of realms is of little importance, primarily affecting performance.

Toroids

We refer to software components that implement the read-only part of the Torus protocol as toroids – literally, things having the form of the Torus. Toroids expose their resources (whatever they are) in the well-defined Torus XML format, as key-value pair records but rarely contain more advanced Torus functionality (like e.g. CQL filtering). The primary purpose of the toroids is to make the various resources consumable by any Torus' realm. Toroids can be attached to existing software and used as a mean of data export.

The bottom line is, toroids expose pseudo-realms that can be used as parents of other realms, but do not inherit from parents themselves. Different toroids may obtain their records from various sources: for example, the Connector Repository maintains its own database of connectors and exposes that using the Torus format; the IRSpy toroid accesses a register of Z39.50 targets that is maintained in ZeeRex format, and acts as a gateway between Torus format and ZeeRex. A static XML file in the appropriate format put on a Web server, also can be used as a toroid.