Inbox component

For a long time the only means to insert records into a Meresco index was by harvesting them from an OAI  repository. Over time a need arised to be able to insert records from non OAI sources. This has been accomplished by making use of the ‘Inbox’ component.

Several Meresco implementers already had their own database without an OAI-repository interface. Moreover it turned out to be impossible  to add OAI interfaces to these systems; some did just not provide the technical means necessary to construct such an interface. Most systems provide means to export their data into a file-format; being one large file or several smaller files. This gives Meresco an opportunity to index these records.

Implementation

The ‘Inbox’ component monitors a directory for file activity. Every file is read and the content is inserted into the application DNA of the server. By adding format specific components as observer of the inbox component, virtually any data format can be used and indexed. For example, using the standard Meresco XSLT crosswalk mechanism, the custom XML format can be converted to e.g. OAI Dublin Core or MODS.

Recent use

Recently the inbox was implemented in the TU Delft Library Discover Project as a means to update records selectively. The usage of the new search engine has uncovered several mistakes made over the years in the catalogue and these are now being corrected. After correcting the found mistakes the record is exported into the inbox and thereby automatically reindexed.

Dependable OAI Repositories

With the rising popularity of Open Access, organizations expect their OAI repositories to be highly dependable. The repository must be able to deal with millions of records and respond quickly to frequent requests from Service Providers.

The Meresco community followed these developments by continuously improving Meresco’s OAI components. During this process, compliance to the OAI-PMH specification grew to near 100% and new specialized indexes were added to keep query response times well under one second.

History

Back in 2007 the first OAI-PMH repository components were implemented in the LOREnet project. The 16 components were reduced to 8 in the OpenER project for the Open University. These 8 components still exists but some of them were significantly refactored to keep up with load and volume requirements. End 2008, Berkely DB replaced Lucene, making it respond much faster in the presence of from and until request parameters. In 2009, huge amounts of sets in the LOREnet project required an even more specialized index to maintain query response times.

Present situation

Today, several multi-million repositories are in use by, among others, Sound & Vision (Beeld en Geluid) and the University of Tilburg (UvT). These two are examples of stand-alone repository implementations. LOREnet and EduRep are examples of repositories integrated in, respectively, a portal and a search engine.

Indexes and Storage

Initially, creating a repository was straightforward using Meresco’s existing storage and Lucene index components.  The new specialized indexes for OAI were also made available as reusable components.   This extends the range of available indexes, which are now: Full text (Lucene), Facets, Range and Dictionary (BerkelyDB and BurstTrie).

Repositories, Search Engines and Archives

Using the available index and storage components, a repository is just as easily created as a Search Engine or a complete Archive.  After all, these are quite similar things.  Any repository needs a storage, but also an index for maintaining it. Similarly every search engine needs a index but also a storage to obtain the result records from. And an archive is yet another combination of storage and index, but with different intentions.