Monthly Archives: January 2016

Building the Graph of Medieval Data

Researchers in Classics and Ancient History have achieved a great deal in the Linked Data world, through services like Pelagios, Pleiades, Perseus, Arachne and CLAROS.

Two of their major initiatives have been to create and publish Uniform Resource Identifiers (URIs) for specific entities, and to reuse these URIs across different services. Most of the interlinking, at this stage, centres on the geographical names recorded in the Pelagios gazetteer, via the Pleiades API and graph explorer.

The result has been an increasingly integrated framework for linking across multiple datasets, under the general rubric of the Graph of Ancient World Data. (1)

Graph of Ancient World Data

There is nothing equivalent to this for medieval studies. This makes it very difficult for researchers (like me) who are interested in joining up diverse sources of evidence relating to medieval manuscripts and analyzing the aggregated information.

Here is an initial list of the major elements which will be needed to create a similar kind of Linked Data infrastructure for medieval manuscript research:

  • Identifiers for medieval people, places, and organizations
  • Identifiers for individual manuscripts – mapped to varying ways of citing institutional shelf-marks
  • Identifiers for the texts carried by manuscripts
  • Linkable versions of specialist vocabularies for describing scripts, decoration and illumination, bindings, coats of arms and bookplates

Medieval people are represented, at least to some extent, in existing Linked Data services – especially Wikidata, VIAF, and Library of Congress Names. Peter Abelard, for example, has a Wikidata record with at least twenty other identifiers cross-linked. Extracting these records could form the basis for a “medieval people” service, which could then be augmented from specialist prosopographical sources.

But the other elements are more problematic. I have written previously about the lack of standard identifiers for medieval manuscripts. There are numerous reference books and databases (such as Scriptorium’s index of manuscripts cited) which list and cross-reference institutional shelf-marks. But they need a location-independent identifier (URI) service, to which the different data can be mapped. It’s very encouraging to see that the German national programme for manuscript digitization includes a proposal for assigning unique identifiers to individual manuscripts. (2)

Problems with identifying and naming medieval texts are discussed by Richard Sharpe in his book Titulus: Identifying Medieval Latin Texts: an Evidence-Based Approach (Turnhout: Brepols, 2003). While titles of medieval works do occur in the Library of Congress Names service, for example, there are far more extensive and authoritative lists which could be expressed as Linked Data URIs. The German master plan also makes provision for identifiers for individual texts.

One of the underlying difficulties with developing Linked Data URIs for medieval entities is that many of the relevant source materials are not yet in a digital form which is suitable for reuse in the Linked Data world. Expressing specialist vocabularies and thesauri in the SKOS format, for instance, would be a worthwhile goal. Other reference works are available only in print or as PDF files.

Even where the source materials are in a more easily reusable digital form, they may not be available for copyright reasons. This is notably the case with the various databases from Brill, Brepols and ProQuest (Chadwyck-Healey) – dictionaries, directories, biographical information, texts and so on. These contain large numbers of entries for specific medieval people, places, texts, manuscripts and so on. Their incorporation into a “Graph of Medieval Data”, without infringing the publishers’ rights, would require detailed technical negotiations.

There is plenty of existing activity aimed at creating shareable digital materials derived from medieval manuscripts. This includes numerous initiatives for the transcription and encoding of texts, especially using the TEI (Text Encoding Initiative). There are also many libraries and projects creating digital images of medieval manuscripts, and there is a growing interest in enabling interoperability by sharing these images through the International Image Interoperability Framework (IIIF).

A “Graph of Medieval Data” would sit as a unifying layer above all these digital resources. It would provide a framework for cross-referencing and interlinking between existing services, and a basis for new annotation and navigation services across disparate digital resources.

This type of infrastructure currently appears to be a long way off. I would really like to see the international manuscript research community coming together to work towards a “Graph of Medieval Data” along these lines.

This approach appears to be our best hope of joining up the vast but disparate body of evidence relating to medieval manuscripts. It would be a huge boon for researchers in this field.


(1) Isaksen, Leif; Simon, Rainer; Barker, Elton T. E. and de Soto Cañamares, Pau (2014). “Pelagios and the emerging graph of ancient world data”, in: WebSci ’14: Proceedings of the 2014 ACM conference on Web science, ACM, pp. 197–201.

(2) Fabian, Claudia; Schreiber, Carolin (2014). “Piloting a national programme for the digitization of medieval manuscripts in Germany”, Liber Quarterly 24 (1)