Chapter 1. Introduction

Table of Contents

1. What Pazpar2 is
2. Connectors to non-standard databases
3. A note on the name Pazpar2

1. What Pazpar2 is

Pazpar2 is a stand-alone metasearch engine with a web-service API, designed to be used either from a browser-based client (JavaScript, Flash, Java applet, etc.), from server-side code, or any combination of the two. Pazpar2 is a highly optimized client designed to search many resources in parallel. It implements record merging, relevance-ranking and sorting by arbitrary data content, and facet analysis for browsing purposes. It is designed to be data-model independent, and is capable of working with MARC, Dublin Core, or any other XML-structured response format -- XSLT is used to normalize and extract data from retrieval records for display and analysis. It can be used against any server which supports the Z39.50, SRU/SRW or Apache Solr protocol. Proprietary backend modules can function as connectors between these standard protocols and any non-standard API, including web-site scraping, to support a large number of other protocols.

Additional functionality such as user management and attractive displays are expected to be implemented by applications that use Pazpar2. Pazpar2 itself is user-interface independent. Its functionality is exposed through a simple XML-based web-service API, designed to be easy to use from an Ajax-enabled browser, Flash animation, Java applet, etc., or from a higher-level server-side language like PHP, Perl or Java. Because session information can be shared between browser-based logic and server-side scripting, there is tremendous flexibility in how you implement application-specific logic on top of Pazpar2.

Once you launch a search in Pazpar2, the operation continues behind the scenes. Pazpar2 connects to servers, carries out searches, and retrieves, de-duplicates, and stores results internally. Your application code may periodically inquire about the status of an ongoing operation, and ask to see records or result set facets. Results become available immediately, and it is easy to build end-user interfaces than feel extremely responsive, even when searching more than 100 servers concurrently.

Pazpar2 is designed to be highly configurable. Incoming records are normalized to XML/UTF-8, and then further normalized using XSLT to a simple internal representation that is suitable for analysis. By providing XSLT stylesheets for different kinds of result records, you can configure Pazpar2 to work against different kinds of information retrieval servers. Finally, metadata is extracted in a configurable way from this internal record, to support display, merging, ranking, result set facets, and sorting. Pazpar2 is not bound to a specific model of metadata, such as Dublin Core or MARC: by providing the right configuration, it can work with any combination of different kinds of data in support of many different applications.

Pazpar2 is designed to be efficient and scalable. You can set it up to search several hundred targets in parallel, or you can use it to support hundreds of concurrent users. It is implemented with the same attention to performance and economy that we use in our indexing engines, so that you can focus on building your application without worrying about the details of metasearch logic. You can devote all of your attention to usability and let Pazpar2 do what it does best -- metasearch.

Pazpar2 is our attempt to re-think the traditional paradigms for implementing and deploying metasearch logic, with an uncompromising approach to performance, and attempting to make maximum use of the capabilities of modern browsers. The demo user interface that accompanies the distribution is but one example. If you think of new ways of using Pazpar2, we hope you'll share them with us, and if we can provide assistance with regards to training, design, programming, integration with different backends, hosting, or support, please don't hesitate to contact us. If you'd like to see functionality in Pazpar2 that is not there today, please don't hesitate to contact us. It may already be in our development pipeline, or there might be a possibility for you to help out by sponsoring development time or code. Either way, get in touch and we will give you straight answers.


Pazpar2 is covered by the GNU General Public License (GPL) version 2. See Appendix A, License for further information.