Provenance data: recommendations for cultural heritage institutions

My European Union Marie Curie Fellowship project focused on the provenance histories of medieval and early modern manuscripts: who created them, who owned them, who bought and sold them, where and when these events took place, and where the manuscripts are now. I’ve written elsewhere about why provenance matters. Here I’d like to offer a few recommendations for cultural heritage institutions about making provenance data more usable and relevant to researchers and other interested people.

These suggestions apply specifically to data about unique cultural heritage objects: manuscripts, art works, museum objects, and do on. But they could also be applied to rare books, prints and other valuable or unusual items which were produced in multiple copies and are therefore not unique – though their provenance history is probably unique.[1]

  1. Acknowledge that provenance is important to researchers as well as to institutions

The acquisition history of a valuable or rare item is crucial information for institutions, especially for demonstrating the authenticity of that item and substantiating the institution’s right to ownership. But researchers are also vitally interested in the histories of collections and objects, as part of wider explorations of cultural history and social change – and also as evidence for the transmission of specific kinds of knowledge and understanding.

It follows that provenance data should not just be seen as internal collection management information. The data are of real value for researchers and users of institutional collections.

  1. Ensure that provenance is publicly documented

Provenance data should be made publicly available, as far as possible. Restricting access to information about the price paid and the former owner may sometimes be justifiable, but – especially for publicly-funded institutions – accountability demands transparency, as a general rule.

Assembling and publishing provenance data should be seen as a key curatorial task. In practice, of course, smaller institutions are less likely to have the time or expertise to do this kind of work. In that case, collaboration with researchers should be a priority, and they should be encouraged to contribute their findings for inclusion in (or linking to) the institutional record.

  1. Present provenance data in a structured and consistent manner

There is no single agreed best practice for recording and presenting provenance data. Libraries take a different approach from museums, for example, and practices vary – even within the same sector. There are various possible models available, ranging from the inadequate to the complex.

  • The MARC record format: putting all the provenance history in a note field in narrative form is unhelpful, though it might be possible – with the use of text-mining tools – to extract the information into a more structured framework. Even the 561 note field (“ownership and Custodial History) in MARC 21 is only loosely structured and remains unindexed in services like WorldCat. Adding personal or corporate access points for former owners is crucial for identifying them in a systematic way but is often not done in library catalogues.
  • FRBR: despite its sophistication, FRBR does not actually offer much scope for structuring provenance data in a more granular way than MARC;
  • CIDOC-CRM: provenance can be modelled and expressed in this very extensive ontology, but it is more relevant as a framework for mapping harvested data to than as a native environment for institutions to create provenance records in;
  • Carnegie Museum of Art provenance standard: based on the AAM Guide to Provenance Research, the CMOA standard offers a good middle-ground for structuring provenance data for art works.

The data model used to record provenance must be sufficiently granular to enable computational processing (i.e., different elements in the data need to be machine-identifiable). It doesn’t necessarily need to be as elaborate as CIDOC-CRM, but it needs to be more structured than MARC. If a customized data model is used, it needs to be documented in sufficient detail for researchers to be able to re-use the data in a more sophisticated or specific setting.

A good example of these more sophisticated settings is the Schoenberg Database of Manuscripts, which incorporates provenance data from a variety of sources and in a variety of formats into its own Data Model.[2]

  1. Make provenance data available for export and harvest

Libraries, museums and other cultural heritage institutions should make their database records available for download by researchers – including provenance data. The records should be in a reusable form like CSV or XML. Appropriate licensing conditions should be specified, to enable reuse of the data for research purposes.

Even though library catalogues are usually available on the Web, MARC records can be surprisingly difficult to harvest. Most libraries do not offer a service for downloading a specified subset of MARC records in a reusable form. The usual offering – at best – is the ability to email a number of selected records to yourself, either in plain text or in a referencing format like EndNote. These formats usually omit the provenance information in a 561 note field.

Museum and gallery databases are probably less likely to be available on the Web than library catalogues. Even when they are, their functionality is unlikely to include downloads of database records. The Powerhouse Museum in Sydney is a notable exception to this, offering a tab-separated spreadsheet download of its entire collections database. The Museum is also one of a small but growing number of institutions which provide access to their collections data via an API (Applications Programming Interface).[3]

Considerable recent effort has been put into licensing and distributing digital images, especially with the recent spread of IIIF. This is a valuable and important development. But descriptive data are also important for researchers, especially in areas like provenance. As Thomas Padilla points out, libraries and other cultural institutions need to re-think their whole approach to providing this kind of data.

Institutions don’t necessarily need to build their own visualizations and analyses of provenance – though the Carnegie Museum of Art has created an interactive public installation. In fact, researchers would generally prefer to harvest data from one or more institutional databases for ingest into their own software environment.

Mitch Fraas has documented his work on extracting provenance data from the text of 561 note fields in MARC records from the University of Pennsylvania Libraries, in order to create a network visualization of the results. CERL’s Material Evidence in Incunabula database combines bibliographical records from the Incunabula Short Title Catalogue with structured provenance records. The 15CBOOKTRADE project has built a visualization and analysis interface on top of this data.

  1. Find ways of harvesting relevant data from researchers

As well as making provenance data available for harvesting in suitable formats and with appropriate licensing, institutions and researchers should be actively discussing how to close the feedback loop. How can researchers’ discoveries about provenance be fed back into institutional records?

Larger manuscript libraries, at least, have traditionally tried to maintain a bibliography of research publications relating to the individual manuscripts in their collection. Some have even added these references to their catalogue records.

Now, perhaps, we can start to investigate ways in which researchers can make their data available for harvesting by institutions, for incorporation into institutional records or for linking to in a Linked Data environment. There are questions for researchers too, about formats and methods for making their provenance data available for computational reuse. This will involve more than simply writing up the results in an article or blog.


[1] Provenance in this sense is different from provenance as defined by the PROV ontology: “Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.”

[2] The Data Model for the Schoenberg Database is currently being re-developed with the help of an NEH grant.

[3] But note the reservations about reliance on APIs raised by Thomas Padilla.

Thomas Phillipps and Alfred Chester Beatty

Sir Alfred Chester Beatty (1875-1968) was one of the great manuscript collectors of the 20th century. His collections now form the basis of the Chester Beatty Library in Dublin.

Almost inevitably, he played a part in the story of the dispersal of the huge manuscript collection of Sir Thomas Phillipps.

Beatty’s acquisitions of former Phillipps manuscripts were discussed by Christopher De Hamel in a Book Collector article in 1991. De Hamel identifies 51 such manuscripts; the actual number is likely to have been at least 60.


Most of the manuscripts acquired by Beatty were bought directly from Phillipps’s grandson, Thomas Fitzroy Fenwick. They were acquired in four groups between 1920 and 1925:

  • 27 were acquired from Fenwick in December 1920. This purchase included one manuscript (Ph. 14122), which was bought by Mrs Edith Beatty separately. The amount paid was £12,454 (including £500 for Mrs Beatty’s manuscript)
  • Eight were acquired from Fenwick in February 1923. The amount paid was £3,320.
  • Nine were acquired from Fenwick in August 1924. The amount paid was £5,105.
  • Eight were acquired from Fenwick by Edith Beatty in November 1925, as a gift for her husband. The amount paid was £21,800.

The fourth group is the most interesting. They included three of the top fifteen manuscripts listed in Beatty’s earlier notes on the “order in which Mr F places manuscripts” (in Beatty’s notebook on Phillipps manuscripts, now owned by Sotheby’s London):

2. Statius, Thebaid (Ph. 1798) – bought for £7,000 (earlier described by Beatty as “a beautiful book … of uniformly high grade – The book is not for sale except at a high price £3000 – £5000”)

6. Dictys Cretensis (Ph. 3502) – bought for £7,000 (Beatty’s earlier comment: “1st Visit talked about £5000”)

13. Ferdinand, Italy XV [i.e., Epistolae of Francesco Barbaro, once in the Aragonese Royal Library] (Ph. 6640) – bought for £3,000

Ph. 1798: Statius, Thebaid

Ph. 1798: Statius, Thebaid

In total, these 52 purchases cost the Beattys £42,679. Mrs Beatty paid much higher prices than her husband; his most expensive purchases (all in 1920) were £2,000 each for Ph. 4259 and Ph. 4769, and £1,500 for Ph. 2165. Subsequently, the most he spent on a single item was £880.

There were also a further eight manuscripts which were not acquired directly from Fenwick.

Three of these (Ph. 3734, 21163 and 21642) were bought from Quaritch in late 1912, and one (Ph. 2803) was bought at a Sotheby’s auction in July 1921. They had originally been sold by Fenwick at Sotheby’s auctions in 1896, 1898 and 1903.

For three other manuscripts (Ph. 345, 629 and 3726), the method and date of acquisition remain unknown. One (Ph. 345) was in Beatty’s hands by 1928 at the latest, and another (Ph. 3726) before 1933. 

Beatty’s final purchase was one of Phillipps’ great treasures: the Armenian Gospel Book (Ph. 15364) with which he was photographed in 1860. It was bought from the Robinson brothers in 1948.

Sir Thomas Phillipps, 1860

Sir Thomas Phillipps, 1860


Beatty offered 24 of his Phillipps manuscripts for sale at Sotheby’s as part of his great auctions of 1932 and 1933. Ten were offered in the 1932 sale; two of these were bought-in. A further fourteen were offered and sold in the 1933 auction.

Three manuscripts were exchanged with, or sold to, the collector A.S. Yahuda in the 1920s and 1930s (Ph. 345, 385 and 437).

Edith Beatty sold at least one of the manuscripts in 1952 to the Morgan Library (Ph. 2165).

Therefore, 34 must have been in the Chester Beatty Library when it first opened to the public in Dublin in 1953.

Seventeen of these were then offered for sale in the Sotheby’s auctions after Beatty’s death. Eight were in the 1968 sale, and nine in the 1969 sale. The latter group included one of the manuscripts bought-in 37 years earlier (Ph. 10190).

Twelve former Phillipps manuscripts appear on the list of Western manuscripts exhibited at the Chester Beatty Library in November 1967. These were the manuscripts which remained in the Library after Beatty’s death in accordance with his will – as set out in the typewritten list certified by Richard James Hayes dated 22 April 1968 (copy in Bodleian Library, R. Pal. 6. 6a).

Today, thirteen former Phillipps manuscripts are still in the Chester Beatty Library. They consist of the twelve Western manuscripts on the exhibition list, and the Armenian Gospels bought in 1948. One of the manuscripts bought-in at the 1932 sale is still in Chester Beatty Library (Ph. 132).

The time and method of disposal of four other manuscripts remains unknown (Ph. 629, 3734, 14122, and 21642).

Current locations

Thirteen of the manuscripts are in the Chester Beatty Library. Of these, six were bought from Fenwick in 1920, six from Fenwick in 1925 by Edith Beatty, and one from the Robinsons in 1948 (the Armenian Gospels).

The current locations of 34 of the other manuscripts are known: United States 18, Italy 6, United Kingdom 6, Switzerland 2, Germany 1, and Israel 1.

The known institutional owners are: Biblioteca nazionale centrale di Roma 6, British Library 4, Harvard University 4, Morgan Library 4, Walters Art Museum 3, Bodmer Collection 2, New York Public Library 2, Yale University 2, Getty Museum 1, Boston Public Library 1, Lincoln College Oxford 1, National Library of Israel 1, Princeton University 1, Sir Paul Getty Library 1, and Stuttgart Landesbibliothek 1.

One manuscript (Ph. 2506) is known to have been broken up after it was sold in 1969. At least 16 individual leaves from it have passed through the sale rooms in the last 45 years, and three of these have been bought back by the Chester Beatty Library.

The current location of twelve manuscripts remains unknown. They include one (Ph. 2251) which was exported to France after its sale in 1975.

Other well-known collectors who have owned these manuscripts at some stage after they were sold by Beatty have included: St John Hornby, Major J.R. Abbey, Philip Hofer, Eric Millar, William Scheide, Martin Schøyen, Peter Ludwig, Sir Paul Getty and A.S. Yahuda.


Here is a nodegoat visualisation of the provenance histories of 21 of the Phillipps-Beatty manuscripts:

nodegoat: provenance histories of 21 Phillipps-Beatty manuscripts

nodegoat: provenance histories of 21 Phillipps-Beatty manuscripts



My thanks to Dr Laura Cleaver (Trinity College Dublin) for convening the recent workshop “Migrant Manuscripts: the Western Manuscripts of the Chester Beatty Collection and Twentieth-Century Provenance Studies”, to the staff of the Chester Beatty Library and to Dr Mara Hoffman of Sotheby’s.

Schoenberg Database of Manuscripts

The initial data for my project are coming from the Schoenberg Database of Manuscripts – a marvellous and unique source which should be of value to any researcher studying the history and provenance of medieval European manuscripts. The database contains more than 220,000 entries, derived mainly from sales catalogues. The full database is made available for download in Excel and CSV formats.

The Schoenberg database contains almost 20,000 records relating to Phillipps manuscripts. This is the single largest provenance group, leaving the Bibliotheque nationale de France (15,000) and the British Library (7,500) well behind.

I downloaded the entire Schoenberg dataset and filtered it for all the records relating to Phillipps manuscripts. I then ran it through the OpenRefine software to split out the individual elements in the “Provenance” and “Comments” fields. The next task is to use these to extract all the Phillipps numbers. I will then be able to identify which Phillipps manuscripts are not represented in the Schoenberg database and use this as the basis for linking in information from other sources.

Many thanks to Lynn Ransom and the Schoenberg team for making their data available in this way.