Monthly Archives: November 2014

Towards Unique Identifiers for Medieval and Renaissance Manuscripts

At the recent Schoenberg Symposium, I suggested that we need a unique identifying system for medieval and Renaissance manuscripts. We need this for two main reasons: to overcome the difficulties inherent in current identification methods, and to ensure that manuscript information can be incorporated into the world of Linked Data.

Current scholarly practice is to cite manuscripts by their present location, institution and shelf-mark. So the Beowulf manuscript should be cited as London, British Library, Cotton Vitellius A XV and the Codex Sinaiticus as London, British Library, Add. 43725. This approach underlies the manuscript indexes of the journal Scriptorium.

As several people at the Schoenberg Symposium were quick to point out, this approach is full of difficulties:

  • Shelf-marks, even at the same institution, change over time. So, for example, the manuscript now referred to as “BnF Latin 9” was previously “Regius 3570”.
  • The names of institutions change over time. The British Library used to be the British Museum; the Pierpont Morgan Library is now the Morgan Library and Museum.
  • Some institutions do not give their manuscripts unique, citable shelf-marks. Alternatives might include a Dewey Decimal classification number, or a generic shelf location.
  • Manuscripts move between different institutions, even today. A move of this kind renders previous citations obsolete.
  • The format of these kinds of shelf-marks is vulnerable to mis-spellings and to numerous variations and inconsistencies. Is it BL or British Library? Add. or Additional?
  • Even if the shelf-marks are unique and consistent, they may not have stable URL equivalents. The State Library of Victoria’s manuscripts, for example, have “handle” URLs for their digitized versions, but not for their catalogue records.

In the Phillipps project, I am fortunate that the manuscripts have their own system of identifiers, which is not tied to their current institutional location. Sir Thomas Phillipps gave his manuscripts individual numbers, which are widely quoted in library catalogue records and in booksellers’ and dealers’ catalogues. The numbers were usually marked on the manuscripts themselves, and have survived the various changes of ownership since the dispersal of the Phillipps Collection.

For my purposes, the Phillipps numbers appear to be sufficiently unique to serve as identifiers. But even these numbers have their problems:

  • A single manuscript may have more than one Phillipps number. The University of Western Australia’s copy of Virgil’s Aeneid was recorded twice in Phillipps’ catalogue (in error), and therefore has the numbers 988 and 2878.
  • The same Phillipps number may have been assigned to more than one manuscript. This is evident in the hand-written supplementary list of manuscripts 23,838 to 26,365, held in the Grolier Club’s Library, where many titles have been crossed out and the numbers re-used for different manuscripts.
  • The Phillipps number may have been recorded incorrectly in subsequent indexes and catalogues. The British Library’s card index to the provenance of Phillipps manuscripts, for example, ends with manuscript number 74,539, which is a simple transcription error for 24,539.
  • There are numerous Phillipps manuscripts which never received a Phillipps number. His printed catalogue finishes at 23,837; Edward Bond’s handwritten supplementary list finishes at 26,179 in one version and 26,365 in another. Thomas Fitzroy Fenwick continued the numbering up to 38,628, though his list has not survived. Munby estimated up to 60,000 manuscripts in all. Unnumbered Phillipps manuscripts are still advertised for sale through sites like AbeBooks, even today.

My proposal is for a unique identifier which conforms to the Uniform Resource Identifier (URI) model used in the world of Linked Data.

Best practice for minting and structuring these URIs is described in the document “Cool URIs for the Semantic Web”, produced by the WorldWideWeb Consortium (W3C). An example of their implementation is given by Linked Data Finland. Some background information can also be found in Phil Archer’s “Study on Persistent URIs”, prepared for the European Commission in 2012.

  • This kind of identifier does not need to conform to (or incorporate) any current or past shelf-marks.
  • Individual codices would have their own URIs.
  • Multi-volume codices could be given a single URI, with subsidiary URIs for each volume.
  • Fragments which were formerly part of a codex could be treated like this: if an item can be (or has been) catalogued individually by the current institution, then it should have its own URI.
  • Individual documents would have their own URI.

Current catalogue records could be used as a starting-point. Each current catalogue record for a manuscript could be regarded as an entity which needs a URI. A basic initial approach might be as follows:

  • Create a URI for each individual manuscript codex currently held and catalogued in a public collection.
  • Create a URI for each document individually catalogued in a public collection.
  • Map current and past shelf-marks to the URI.
  • Map current and past catalogue records to the URI.

Subsequent use cases would include the following:

  • Manuscripts which are now dispersed or fragmented could be virtually re-united by creating an additional URI for the original manuscript and creating relationships between this URI and the URIs for each current fragment.
  • Previously separate manuscripts which are now combined into a single volume could be virtually dis-bound by creating additional URIs for each former manuscript and creating relationships between these URIs and the URI for the current codex.
  • Information from different sources about the same manuscript could be linked by matching disparate data to the same URI.

I am not proposing a unified central catalogue of manuscripts, in which full descriptions would be normalized to an agreed metadata schema. Instead, an identifier service would provide a crucial structural element which could be used as the basis for future aggregations of data relating to manuscripts. The service would need to incorporate some minimal descriptive information about the manuscript referred to by each URI: a shelf-mark and institution, at the very least, preferably accompanied by a title (conventional or bibliographical).

The technical aspects of this proposal are one issue. Even more crucial, though, are the politics and funding involved in setting up a service to mint, manage and distribute such URIs. In the book world, much of the impetus for ISBNs and ISSNs (and their predecessors) came from the book trade, which could see a clear commercial advantage in unique numbering systems. In the wider world of Linked Data, various URI services for personal names (like VIAF, ISNI and ORCID) have been developed by consortia and co-operatives in the world of libraries and publishing.

A manuscript identifier service, in contrast, has less commercial value. It will take a combination of libraries and researchers – and possibly publishers – to develop, implement and fund such a service. Some of the key benefits and justifications will be:

  • A framework like this is necessary for any global or international integrated system related to manuscripts.
  • It can overcome the fragmentary nature of the many manuscript databases now in existence, and help to link the proliferating collections of digitized manuscripts.
  • There are huge benefits for researchers in being able to find manuscripts – and information about them – much more quickly and reliably, as well as being able to cite manuscripts more effectively and unambiguously in their own research.
  • There are significant benefits for libraries in promoting their manuscripts, building links to scholarship based on their manuscripts, and connecting their manuscripts to other manuscripts held elsewhere.

There are several existing initiatives working towards unique identifiers for manuscripts. [1]

These identifiers have also been adopted by Diktyon, the “digital network for Greek manuscripts”: http://www.diktyon.org/en/identifiers-manuscripts

This identifier does not necessarily equate to a single manuscript codex (or even one manuscript in multiple volumes). The URL http://pinakes.irht.cnrs.fr/notices/fond/id/977 represents three manuscripts owned by the Library Company of Philadelphia, which also have individual catalogue records and identifiers.

The Trismegistos number (TM_id) maps between (1) publication identifiers (especially sigla), (2) collection inventory numbers (i.e. equivalent of shelf-marks) and (3) conventional names like “the Rosetta Stone”.

These numbers are used solely within the context of the Trismegistos database. They are not expressed as Linked Data URIs, though they do have stable URLs.

  • The Europeana digital library aggregates metadata about digitized objects from many European cultural institutions: http://europeana.eu

It includes URIs for each object, created and structured in accordance with the framework of the W3C.

A version of the Europeana Data Model specifically for hand-written manuscripts has been developed by the DM2E (Digitized Manuscripts to Europeana) project (2012-2015).

While Europeana contains records for a significant number of medieval and early modern manuscripts, it is impossible to estimate how many. Its scope is European, not global, and it excludes manuscripts which have not been digitized.

Developing and hosting a manuscript identifier service will require a partnership between interested organizations in Europe and North America. These will need to include library consortia and researchers’ associations. Some possibilities might include CERL, LIBER, the Medieval Academy, the Renaissance Society of America and CARMEN. Specialist publishers like Brepols and Brill could also be involved.

Funding will also have to be raised. Some possible sources might include infrastructure funding programmes like the European Union’s Horizon 2020, and foundations like the Mellon Foundation.

Without such a service, medieval and Renaissance manuscripts are likely to miss out on the benefits to be gained from the world of Linked Data. Databases will remain dispersed and fragmented, digital resources will be difficult to locate, and citations will continue to be inconsistent and confusing. A unique identifier service is the key to linking and joining up all these resources. It will dramatically increase the efficiency, richness and interconnectedness of the manuscript digital ecosystem, to the benefit of researchers and cultural heritage institutions alike.

[1] My thanks to Cillian O’Hogan, Carrie Schroeder and Matthieu Cassin for these suggestions (via Twitter).