Monthly Archives: July 2016

Provenance data: recommendations for cultural heritage institutions

My European Union Marie Curie Fellowship project focused on the provenance histories of medieval and early modern manuscripts: who created them, who owned them, who bought and sold them, where and when these events took place, and where the manuscripts are now. I’ve written elsewhere about why provenance matters. Here I’d like to offer a few recommendations for cultural heritage institutions about making provenance data more usable and relevant to researchers and other interested people.

These suggestions apply specifically to data about unique cultural heritage objects: manuscripts, art works, museum objects, and do on. But they could also be applied to rare books, prints and other valuable or unusual items which were produced in multiple copies and are therefore not unique – though their provenance history is probably unique.[1]

  1. Acknowledge that provenance is important to researchers as well as to institutions

The acquisition history of a valuable or rare item is crucial information for institutions, especially for demonstrating the authenticity of that item and substantiating the institution’s right to ownership. But researchers are also vitally interested in the histories of collections and objects, as part of wider explorations of cultural history and social change – and also as evidence for the transmission of specific kinds of knowledge and understanding.

It follows that provenance data should not just be seen as internal collection management information. The data are of real value for researchers and users of institutional collections.

  1. Ensure that provenance is publicly documented

Provenance data should be made publicly available, as far as possible. Restricting access to information about the price paid and the former owner may sometimes be justifiable, but – especially for publicly-funded institutions – accountability demands transparency, as a general rule.

Assembling and publishing provenance data should be seen as a key curatorial task. In practice, of course, smaller institutions are less likely to have the time or expertise to do this kind of work. In that case, collaboration with researchers should be a priority, and they should be encouraged to contribute their findings for inclusion in (or linking to) the institutional record.

  1. Present provenance data in a structured and consistent manner

There is no single agreed best practice for recording and presenting provenance data. Libraries take a different approach from museums, for example, and practices vary – even within the same sector. There are various possible models available, ranging from the inadequate to the complex.

  • The MARC record format: putting all the provenance history in a note field in narrative form is unhelpful, though it might be possible – with the use of text-mining tools – to extract the information into a more structured framework. Even the 561 note field (“ownership and Custodial History) in MARC 21 is only loosely structured and remains unindexed in services like WorldCat. Adding personal or corporate access points for former owners is crucial for identifying them in a systematic way but is often not done in library catalogues.
  • FRBR: despite its sophistication, FRBR does not actually offer much scope for structuring provenance data in a more granular way than MARC;
  • CIDOC-CRM: provenance can be modelled and expressed in this very extensive ontology, but it is more relevant as a framework for mapping harvested data to than as a native environment for institutions to create provenance records in;
  • Carnegie Museum of Art provenance standard: based on the AAM Guide to Provenance Research, the CMOA standard offers a good middle-ground for structuring provenance data for art works.

The data model used to record provenance must be sufficiently granular to enable computational processing (i.e., different elements in the data need to be machine-identifiable). It doesn’t necessarily need to be as elaborate as CIDOC-CRM, but it needs to be more structured than MARC. If a customized data model is used, it needs to be documented in sufficient detail for researchers to be able to re-use the data in a more sophisticated or specific setting.

A good example of these more sophisticated settings is the Schoenberg Database of Manuscripts, which incorporates provenance data from a variety of sources and in a variety of formats into its own Data Model.[2]

  1. Make provenance data available for export and harvest

Libraries, museums and other cultural heritage institutions should make their database records available for download by researchers – including provenance data. The records should be in a reusable form like CSV or XML. Appropriate licensing conditions should be specified, to enable reuse of the data for research purposes.

Even though library catalogues are usually available on the Web, MARC records can be surprisingly difficult to harvest. Most libraries do not offer a service for downloading a specified subset of MARC records in a reusable form. The usual offering – at best – is the ability to email a number of selected records to yourself, either in plain text or in a referencing format like EndNote. These formats usually omit the provenance information in a 561 note field.

Museum and gallery databases are probably less likely to be available on the Web than library catalogues. Even when they are, their functionality is unlikely to include downloads of database records. The Powerhouse Museum in Sydney is a notable exception to this, offering a tab-separated spreadsheet download of its entire collections database. The Museum is also one of a small but growing number of institutions which provide access to their collections data via an API (Applications Programming Interface).[3]

Considerable recent effort has been put into licensing and distributing digital images, especially with the recent spread of IIIF. This is a valuable and important development. But descriptive data are also important for researchers, especially in areas like provenance. As Thomas Padilla points out, libraries and other cultural institutions need to re-think their whole approach to providing this kind of data.

Institutions don’t necessarily need to build their own visualizations and analyses of provenance – though the Carnegie Museum of Art has created an interactive public installation. In fact, researchers would generally prefer to harvest data from one or more institutional databases for ingest into their own software environment.

Mitch Fraas has documented his work on extracting provenance data from the text of 561 note fields in MARC records from the University of Pennsylvania Libraries, in order to create a network visualization of the results. CERL’s Material Evidence in Incunabula database combines bibliographical records from the Incunabula Short Title Catalogue with structured provenance records. The 15CBOOKTRADE project has built a visualization and analysis interface on top of this data.

  1. Find ways of harvesting relevant data from researchers

As well as making provenance data available for harvesting in suitable formats and with appropriate licensing, institutions and researchers should be actively discussing how to close the feedback loop. How can researchers’ discoveries about provenance be fed back into institutional records?

Larger manuscript libraries, at least, have traditionally tried to maintain a bibliography of research publications relating to the individual manuscripts in their collection. Some have even added these references to their catalogue records.

Now, perhaps, we can start to investigate ways in which researchers can make their data available for harvesting by institutions, for incorporation into institutional records or for linking to in a Linked Data environment. There are questions for researchers too, about formats and methods for making their provenance data available for computational reuse. This will involve more than simply writing up the results in an article or blog.

Notes

[1] Provenance in this sense is different from provenance as defined by the PROV ontology: “Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.”

[2] The Data Model for the Schoenberg Database is currently being re-developed with the help of an NEH grant.

[3] But note the reservations about reliance on APIs raised by Thomas Padilla.

Advertisements