Provenance data: recommendations for cultural heritage institutions

My European Union Marie Curie Fellowship project focused on the provenance histories of medieval and early modern manuscripts: who created them, who owned them, who bought and sold them, where and when these events took place, and where the manuscripts are now. I’ve written elsewhere about why provenance matters. Here I’d like to offer a few recommendations for cultural heritage institutions about making provenance data more usable and relevant to researchers and other interested people.

These suggestions apply specifically to data about unique cultural heritage objects: manuscripts, art works, museum objects, and do on. But they could also be applied to rare books, prints and other valuable or unusual items which were produced in multiple copies and are therefore not unique – though their provenance history is probably unique.[1]

  1. Acknowledge that provenance is important to researchers as well as to institutions

The acquisition history of a valuable or rare item is crucial information for institutions, especially for demonstrating the authenticity of that item and substantiating the institution’s right to ownership. But researchers are also vitally interested in the histories of collections and objects, as part of wider explorations of cultural history and social change – and also as evidence for the transmission of specific kinds of knowledge and understanding.

It follows that provenance data should not just be seen as internal collection management information. The data are of real value for researchers and users of institutional collections.

  1. Ensure that provenance is publicly documented

Provenance data should be made publicly available, as far as possible. Restricting access to information about the price paid and the former owner may sometimes be justifiable, but – especially for publicly-funded institutions – accountability demands transparency, as a general rule.

Assembling and publishing provenance data should be seen as a key curatorial task. In practice, of course, smaller institutions are less likely to have the time or expertise to do this kind of work. In that case, collaboration with researchers should be a priority, and they should be encouraged to contribute their findings for inclusion in (or linking to) the institutional record.

  1. Present provenance data in a structured and consistent manner

There is no single agreed best practice for recording and presenting provenance data. Libraries take a different approach from museums, for example, and practices vary – even within the same sector. There are various possible models available, ranging from the inadequate to the complex.

  • The MARC record format: putting all the provenance history in a note field in narrative form is unhelpful, though it might be possible – with the use of text-mining tools – to extract the information into a more structured framework. Even the 561 note field (“ownership and Custodial History) in MARC 21 is only loosely structured and remains unindexed in services like WorldCat. Adding personal or corporate access points for former owners is crucial for identifying them in a systematic way but is often not done in library catalogues.
  • FRBR: despite its sophistication, FRBR does not actually offer much scope for structuring provenance data in a more granular way than MARC;
  • CIDOC-CRM: provenance can be modelled and expressed in this very extensive ontology, but it is more relevant as a framework for mapping harvested data to than as a native environment for institutions to create provenance records in;
  • Carnegie Museum of Art provenance standard: based on the AAM Guide to Provenance Research, the CMOA standard offers a good middle-ground for structuring provenance data for art works.

The data model used to record provenance must be sufficiently granular to enable computational processing (i.e., different elements in the data need to be machine-identifiable). It doesn’t necessarily need to be as elaborate as CIDOC-CRM, but it needs to be more structured than MARC. If a customized data model is used, it needs to be documented in sufficient detail for researchers to be able to re-use the data in a more sophisticated or specific setting.

A good example of these more sophisticated settings is the Schoenberg Database of Manuscripts, which incorporates provenance data from a variety of sources and in a variety of formats into its own Data Model.[2]

  1. Make provenance data available for export and harvest

Libraries, museums and other cultural heritage institutions should make their database records available for download by researchers – including provenance data. The records should be in a reusable form like CSV or XML. Appropriate licensing conditions should be specified, to enable reuse of the data for research purposes.

Even though library catalogues are usually available on the Web, MARC records can be surprisingly difficult to harvest. Most libraries do not offer a service for downloading a specified subset of MARC records in a reusable form. The usual offering – at best – is the ability to email a number of selected records to yourself, either in plain text or in a referencing format like EndNote. These formats usually omit the provenance information in a 561 note field.

Museum and gallery databases are probably less likely to be available on the Web than library catalogues. Even when they are, their functionality is unlikely to include downloads of database records. The Powerhouse Museum in Sydney is a notable exception to this, offering a tab-separated spreadsheet download of its entire collections database. The Museum is also one of a small but growing number of institutions which provide access to their collections data via an API (Applications Programming Interface).[3]

Considerable recent effort has been put into licensing and distributing digital images, especially with the recent spread of IIIF. This is a valuable and important development. But descriptive data are also important for researchers, especially in areas like provenance. As Thomas Padilla points out, libraries and other cultural institutions need to re-think their whole approach to providing this kind of data.

Institutions don’t necessarily need to build their own visualizations and analyses of provenance – though the Carnegie Museum of Art has created an interactive public installation. In fact, researchers would generally prefer to harvest data from one or more institutional databases for ingest into their own software environment.

Mitch Fraas has documented his work on extracting provenance data from the text of 561 note fields in MARC records from the University of Pennsylvania Libraries, in order to create a network visualization of the results. CERL’s Material Evidence in Incunabula database combines bibliographical records from the Incunabula Short Title Catalogue with structured provenance records. The 15CBOOKTRADE project has built a visualization and analysis interface on top of this data.

  1. Find ways of harvesting relevant data from researchers

As well as making provenance data available for harvesting in suitable formats and with appropriate licensing, institutions and researchers should be actively discussing how to close the feedback loop. How can researchers’ discoveries about provenance be fed back into institutional records?

Larger manuscript libraries, at least, have traditionally tried to maintain a bibliography of research publications relating to the individual manuscripts in their collection. Some have even added these references to their catalogue records.

Now, perhaps, we can start to investigate ways in which researchers can make their data available for harvesting by institutions, for incorporation into institutional records or for linking to in a Linked Data environment. There are questions for researchers too, about formats and methods for making their provenance data available for computational reuse. This will involve more than simply writing up the results in an article or blog.


[1] Provenance in this sense is different from provenance as defined by the PROV ontology: “Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.”

[2] The Data Model for the Schoenberg Database is currently being re-developed with the help of an NEH grant.

[3] But note the reservations about reliance on APIs raised by Thomas Padilla.

Building the Graph of Medieval Data

Researchers in Classics and Ancient History have achieved a great deal in the Linked Data world, through services like Pelagios, Pleiades, Perseus, Arachne and CLAROS.

Two of their major initiatives have been to create and publish Uniform Resource Identifiers (URIs) for specific entities, and to reuse these URIs across different services. Most of the interlinking, at this stage, centres on the geographical names recorded in the Pelagios gazetteer, via the Pleiades API and graph explorer.

The result has been an increasingly integrated framework for linking across multiple datasets, under the general rubric of the Graph of Ancient World Data. (1)

Graph of Ancient World Data

There is nothing equivalent to this for medieval studies. This makes it very difficult for researchers (like me) who are interested in joining up diverse sources of evidence relating to medieval manuscripts and analyzing the aggregated information.

Here is an initial list of the major elements which will be needed to create a similar kind of Linked Data infrastructure for medieval manuscript research:

  • Identifiers for medieval people, places, and organizations
  • Identifiers for individual manuscripts – mapped to varying ways of citing institutional shelf-marks
  • Identifiers for the texts carried by manuscripts
  • Linkable versions of specialist vocabularies for describing scripts, decoration and illumination, bindings, coats of arms and bookplates

Medieval people are represented, at least to some extent, in existing Linked Data services – especially Wikidata, VIAF, and Library of Congress Names. Peter Abelard, for example, has a Wikidata record with at least twenty other identifiers cross-linked. Extracting these records could form the basis for a “medieval people” service, which could then be augmented from specialist prosopographical sources.

But the other elements are more problematic. I have written previously about the lack of standard identifiers for medieval manuscripts. There are numerous reference books and databases (such as Scriptorium’s index of manuscripts cited) which list and cross-reference institutional shelf-marks. But they need a location-independent identifier (URI) service, to which the different data can be mapped. It’s very encouraging to see that the German national programme for manuscript digitization includes a proposal for assigning unique identifiers to individual manuscripts. (2)

Problems with identifying and naming medieval texts are discussed by Richard Sharpe in his book Titulus: Identifying Medieval Latin Texts: an Evidence-Based Approach (Turnhout: Brepols, 2003). While titles of medieval works do occur in the Library of Congress Names service, for example, there are far more extensive and authoritative lists which could be expressed as Linked Data URIs. The German master plan also makes provision for identifiers for individual texts.

One of the underlying difficulties with developing Linked Data URIs for medieval entities is that many of the relevant source materials are not yet in a digital form which is suitable for reuse in the Linked Data world. Expressing specialist vocabularies and thesauri in the SKOS format, for instance, would be a worthwhile goal. Other reference works are available only in print or as PDF files.

Even where the source materials are in a more easily reusable digital form, they may not be available for copyright reasons. This is notably the case with the various databases from Brill, Brepols and ProQuest (Chadwyck-Healey) – dictionaries, directories, biographical information, texts and so on. These contain large numbers of entries for specific medieval people, places, texts, manuscripts and so on. Their incorporation into a “Graph of Medieval Data”, without infringing the publishers’ rights, would require detailed technical negotiations.

There is plenty of existing activity aimed at creating shareable digital materials derived from medieval manuscripts. This includes numerous initiatives for the transcription and encoding of texts, especially using the TEI (Text Encoding Initiative). There are also many libraries and projects creating digital images of medieval manuscripts, and there is a growing interest in enabling interoperability by sharing these images through the International Image Interoperability Framework (IIIF).

A “Graph of Medieval Data” would sit as a unifying layer above all these digital resources. It would provide a framework for cross-referencing and interlinking between existing services, and a basis for new annotation and navigation services across disparate digital resources.

This type of infrastructure currently appears to be a long way off. I would really like to see the international manuscript research community coming together to work towards a “Graph of Medieval Data” along these lines.

This approach appears to be our best hope of joining up the vast but disparate body of evidence relating to medieval manuscripts. It would be a huge boon for researchers in this field.


(1) Isaksen, Leif; Simon, Rainer; Barker, Elton T. E. and de Soto Cañamares, Pau (2014). “Pelagios and the emerging graph of ancient world data”, in: WebSci ’14: Proceedings of the 2014 ACM conference on Web science, ACM, pp. 197–201.

(2) Fabian, Claudia; Schreiber, Carolin (2014). “Piloting a national programme for the digitization of medieval manuscripts in Germany”, Liber Quarterly 24 (1)

Thomas Phillipps and Alfred Chester Beatty

Sir Alfred Chester Beatty (1875-1968) was one of the great manuscript collectors of the 20th century. His collections now form the basis of the Chester Beatty Library in Dublin.

Almost inevitably, he played a part in the story of the dispersal of the huge manuscript collection of Sir Thomas Phillipps.

Beatty’s acquisitions of former Phillipps manuscripts were discussed by Christopher De Hamel in a Book Collector article in 1991. De Hamel identifies 51 such manuscripts; the actual number is likely to have been at least 60.


Most of the manuscripts acquired by Beatty were bought directly from Phillipps’s grandson, Thomas Fitzroy Fenwick. They were acquired in four groups between 1920 and 1925:

  • 27 were acquired from Fenwick in December 1920. This purchase included one manuscript (Ph. 14122), which was bought by Mrs Edith Beatty separately. The amount paid was £12,454 (including £500 for Mrs Beatty’s manuscript)
  • Eight were acquired from Fenwick in February 1923. The amount paid was £3,320.
  • Nine were acquired from Fenwick in August 1924. The amount paid was £5,105.
  • Eight were acquired from Fenwick by Edith Beatty in November 1925, as a gift for her husband. The amount paid was £21,800.

The fourth group is the most interesting. They included three of the top fifteen manuscripts listed in Beatty’s earlier notes on the “order in which Mr F places manuscripts” (in Beatty’s notebook on Phillipps manuscripts, now owned by Sotheby’s London):

2. Statius, Thebaid (Ph. 1798) – bought for £7,000 (earlier described by Beatty as “a beautiful book … of uniformly high grade – The book is not for sale except at a high price £3000 – £5000”)

6. Dictys Cretensis (Ph. 3502) – bought for £7,000 (Beatty’s earlier comment: “1st Visit talked about £5000”)

13. Ferdinand, Italy XV [i.e., Epistolae of Francesco Barbaro, once in the Aragonese Royal Library] (Ph. 6640) – bought for £3,000

Ph. 1798: Statius, Thebaid

Ph. 1798: Statius, Thebaid

In total, these 52 purchases cost the Beattys £42,679. Mrs Beatty paid much higher prices than her husband; his most expensive purchases (all in 1920) were £2,000 each for Ph. 4259 and Ph. 4769, and £1,500 for Ph. 2165. Subsequently, the most he spent on a single item was £880.

There were also a further eight manuscripts which were not acquired directly from Fenwick.

Three of these (Ph. 3734, 21163 and 21642) were bought from Quaritch in late 1912, and one (Ph. 2803) was bought at a Sotheby’s auction in July 1921. They had originally been sold by Fenwick at Sotheby’s auctions in 1896, 1898 and 1903.

For three other manuscripts (Ph. 345, 629 and 3726), the method and date of acquisition remain unknown. One (Ph. 345) was in Beatty’s hands by 1928 at the latest, and another (Ph. 3726) before 1933. 

Beatty’s final purchase was one of Phillipps’ great treasures: the Armenian Gospel Book (Ph. 15364) with which he was photographed in 1860. It was bought from the Robinson brothers in 1948.

Sir Thomas Phillipps, 1860

Sir Thomas Phillipps, 1860


Beatty offered 24 of his Phillipps manuscripts for sale at Sotheby’s as part of his great auctions of 1932 and 1933. Ten were offered in the 1932 sale; two of these were bought-in. A further fourteen were offered and sold in the 1933 auction.

Three manuscripts were exchanged with, or sold to, the collector A.S. Yahuda in the 1920s and 1930s (Ph. 345, 385 and 437).

Edith Beatty sold at least one of the manuscripts in 1952 to the Morgan Library (Ph. 2165).

Therefore, 34 must have been in the Chester Beatty Library when it first opened to the public in Dublin in 1953.

Seventeen of these were then offered for sale in the Sotheby’s auctions after Beatty’s death. Eight were in the 1968 sale, and nine in the 1969 sale. The latter group included one of the manuscripts bought-in 37 years earlier (Ph. 10190).

Twelve former Phillipps manuscripts appear on the list of Western manuscripts exhibited at the Chester Beatty Library in November 1967. These were the manuscripts which remained in the Library after Beatty’s death in accordance with his will – as set out in the typewritten list certified by Richard James Hayes dated 22 April 1968 (copy in Bodleian Library, R. Pal. 6. 6a).

Today, thirteen former Phillipps manuscripts are still in the Chester Beatty Library. They consist of the twelve Western manuscripts on the exhibition list, and the Armenian Gospels bought in 1948. One of the manuscripts bought-in at the 1932 sale is still in Chester Beatty Library (Ph. 132).

The time and method of disposal of four other manuscripts remains unknown (Ph. 629, 3734, 14122, and 21642).

Current locations

Thirteen of the manuscripts are in the Chester Beatty Library. Of these, six were bought from Fenwick in 1920, six from Fenwick in 1925 by Edith Beatty, and one from the Robinsons in 1948 (the Armenian Gospels).

The current locations of 34 of the other manuscripts are known: United States 18, Italy 6, United Kingdom 6, Switzerland 2, Germany 1, and Israel 1.

The known institutional owners are: Biblioteca nazionale centrale di Roma 6, British Library 4, Harvard University 4, Morgan Library 4, Walters Art Museum 3, Bodmer Collection 2, New York Public Library 2, Yale University 2, Getty Museum 1, Boston Public Library 1, Lincoln College Oxford 1, National Library of Israel 1, Princeton University 1, Sir Paul Getty Library 1, and Stuttgart Landesbibliothek 1.

One manuscript (Ph. 2506) is known to have been broken up after it was sold in 1969. At least 16 individual leaves from it have passed through the sale rooms in the last 45 years, and three of these have been bought back by the Chester Beatty Library.

The current location of twelve manuscripts remains unknown. They include one (Ph. 2251) which was exported to France after its sale in 1975.

Other well-known collectors who have owned these manuscripts at some stage after they were sold by Beatty have included: St John Hornby, Major J.R. Abbey, Philip Hofer, Eric Millar, William Scheide, Martin Schøyen, Peter Ludwig, Sir Paul Getty and A.S. Yahuda.


Here is a nodegoat visualisation of the provenance histories of 21 of the Phillipps-Beatty manuscripts:

nodegoat: provenance histories of 21 Phillipps-Beatty manuscripts

nodegoat: provenance histories of 21 Phillipps-Beatty manuscripts



My thanks to Dr Laura Cleaver (Trinity College Dublin) for convening the recent workshop “Migrant Manuscripts: the Western Manuscripts of the Chester Beatty Collection and Twentieth-Century Provenance Studies”, to the staff of the Chester Beatty Library and to Dr Mara Hoffman of Sotheby’s.

The Phillipps Babylonian Cylinder: MS 3902 (Tales of the Phillipps Manuscripts #4)

Among the 23,837 entries in Sir Thomas Phillipps’s printed catalogue of his manuscripts are two unusual items:

3902   A Babylonian Cylinder, with arrow-head inscriptions

3903   Fragment of a Babylonian Inscription

They appear in a section of the catalogue entitled “Captain Mignan, Oriental MSS. &c.”, which also includes 35 Arabic and Persian manuscripts (Phillipps MSS 3904-3938) (1). This group of items was bought by Phillipps in 1829 from Captain Robert Mignan of the East India Company, for a total of £300 (2).

Mignan’s hand-written list of these items still survives among Phillipps’ papers (3). The cylinder is described as follows:

A cylinder composed of the finest furnace baked clay, with the cuneiform writing engraved upon its surface; executed with great delicacy, and beauty. This valuable piece of antiquity is the largest of two, only known to exist in this, or any country. It was discovered in a winding souterrain beneath the ruin at Babylon called by the natives of the country “Kasr” – or The “Palace”, which occupies the supposed site of the great western palace, and hanging gardens. (Length nine inches – circumference sixteen inches)

This account is very similar to a story told by Mignan in his book Travels in Chaldaea, published in 1829. Here he recounts how he discovered a cylinder in late 1827, in the Babylonian ruins known as El Hamir, eight miles from the town of Hillah, “in one of the innumerable unexplored passages, at the eastern side of that remarkable ruin the Kasr, or great castellated palace” (4). Mignan describes this cylinder as being nine inches long and fifteen inches in circumference, and his published drawing shows that it contained three columns of cuneiform writing.


Mignan, Robert, Travels in Chaldaea (1829)

The cylinder remained in the Phillipps collection until 1945, when it was acquired by Lionel and Philip Robinson as part of the unsold residue. They advertised it for sale in 1948, describing it as “one of the most famous relics of Babylonian antiquity” (5). The asking price was £2,500. The accompanying photograph shows that the cylinder has three columns of cuneiform text. Its dimensions are given as 8¾ inches (22.2 cm) in length and 5⅞ inches (14.9 cm) in maximum diameter. The sale catalogue does not quote the Phillipps number and makes no reference to Mignan.

The catalogue entry contains a lengthy account of the history and contents of the cylinder, based on a memorandum by “a well-known Assyriologist”. The text of the cylinder begins by recounting Nebuchadnezzar’s architectural achievements: the building of the East Wall of the City of Babylon, the restoration of the temple tower, and the rebuilding of various other temples: Nebo’s temple, Ezida, at Borsippa, Ebarra at Sippar, Eanna at Erech, and Egishshirgal at Ur. The cylinder’s specific purpose was to commemorate Nebuchadnezzar’s reconstruction and enlargement of the palace of his father Nabopolassar.

The catalogue entry acknowledges that little is known about the history of the cylinder before Phillipps acquired it. The author speculates that it might have been dug out of the ruins of Babylon by Arabs excavating there for the Abbé Beauchamps in 1784. He notes that the cylinder was copied in 1818 by Carl Bellino, who was working for Claudius James Rich, the East India Company’s resident in Baghdad and a keen collector of cuneiform antiquities. When Bellino made his copy, the cylinder was in the possession of the Catholic-Armenian Vicar-General of Ispahan, who lived in Baghdad. This information is derived from later accounts published by G.F. Grotefend, the German scholar who received a number of facsimile transcriptions from Bellino at that time, and gradually published them over the next four decades. Bellino himself died in 1820.

This account is at odds with that given by Captain Mignan. The Phillipps Cylinder could not have been copied by Bellino in 1818 and subsequently discovered in the ruins of Babylon by Mignan in 1827. What are the possible explanations for this discrepancy?

It is possible that two different cylinders have become confused. If this is the case, the first cylinder would have been transcribed by Bellino in Baghdad in 1818, and the transcription sent to Grotefend. A second cylinder could then have been found by Mignan in 1827 and never seen by Bellino. Which of these then became the Phillipps Cylinder? On the basis of Mignan’s descriptions, it was almost certainly the second one. But, in that case, the Bellino transcription could not have been made from the Phillipps Cylinder.

During the course of the nineteenth century, however, the Phillipps Cylinder did become identified as the source of the Bellino transcription. This transcription served as the basis for facsimiles published by Grotefend in 1850 (6) and by Henry Rawlinson in 1861 (7). Grotefend did not connect his transcription with the Phillipps Cylinder. Rawlinson, on the other hand, described his facsimile as being “From a Clay Cylinder found at Babylon and now in the possession of Sir Thomas Phillipps Bart of Middle Hill”. He seems to have been the first to connect the Bellino transcription and the Phillipps Cylinder in this way. But Rawlinson refused to see the cylinder for himself, according to Phillipps (8):

I offered to shew it to Sir Henry Rawlinson, but he did not chuse to come to me, therefore it is at the service of some other Babylonian Interpreter, who will not eat up his own words so often.

Phillipps also claimed that Rawlinson had described it as the “Bellino Cylinder” (9). This may simply have meant that Bellino had already transcribed this cylinder (which might explain Rawlinson’s lack of interest in seeing it). The object which is now known as the Bellino Cylinder is completely different; it has only a single column of 64 lines, dates from the reign of Sennacherib, and has been in the British Museum since 1825 (10).

A year after Rawlinson’s facsimile appeared, Henry Fox Talbot published a translation of the text (11). Talbot and Phillipps had exchanged several letters about the cylinder in the preceding five years, and Phillipps had sent at least one photograph of it to Talbot, in December 1856 (12):

I have had several trials to obtain a good impression of the Babylonian Inscriptions, but I have not succeeded yet. I forward to you the best yet made, with the hope that you may be able to decipher some of it.

It appears to me that none of the Lenses have been large enough. I have two Collodions on Glass for you, but they will require to be carefully packed in a small box, so as not to rub, & I intend to get it made.

The fragment enclosed is part of the other half; the man broke the glass before he cd take the Talbotype, so I made him take the largest fragment of it.

Phillipps may be referring to photographs of the cylinder taken by Mrs Amelia Guppy in 1853, which are preserved in a photograph album now in the Houghton Library at Harvard University (13). The album contains a selection of prints and negatives showing the cylinder (described by Phillipps as a “Babylonian Urn”), either on its own or in combination with what appears to be Mignan’s “Fragment of a Babylonian Inscription”. But none of these photographs would have been suitable for Talbot to use in making his transcription.

Nowhere does Talbot say that he has actually seen the Phillipps Cylinder. Nor does he mention using Phillipps’ photograph for his translation. His only references are to Grotefend’s 1850 engraving and Rawlinson’s 1861 lithograph, both of which he owned and used:

This inscription is from a clay cylinder, found at Babylon and now in the possession of Sir T. Phillipps Bart., of Middle Hill. The cuneiform text was admirably copied in facsimile by Bellino, many years ago; and the engraving of this on a copper plate, by the care of Grotefend, is equally excellent. More recently, it has been lithographed in larger and plainer characters in Pl. 65 of the New Volume of Inscriptions, published by the Trustees of the British Museum, under the skillful direction of Sir H. Rawlinson.

Talbot’s translation was followed by various other translations in the later nineteenth and early twentieth centuries. The text was eventually incorporated into the standard compilations of Neo-Babylonian royal inscriptions: Langdon (1912), where it is identified as Nr. 9; Berger (1973), where it is identified as Nbk Zyl III,4; and Da Riva (2008), where it is given the identifier C34 (14). There is no evidence that Langdon ever saw the cylinder; he appears to have worked from the published facsimiles. Both Berger and Da Riva describe the cylinder as formerly in the Phillipps collection at Middle Hill, with its present whereabouts unknown.


The Grotefend 1850 engraving, based on Bellino’s transcription

Did any of the people associated with the Phillipps cylinder ever cross-check it against the transcriptions? Grotefend was never in a position to do so; Rawlinson does not appear to have done so; and there is no evidence that Talbot did. The Robinsons’ sale catalogue does not say whether their expert Assyriologist had checked the Phillipps Cylinder against the Grotefend or Rawlinson facsimiles. The photograph provided in the sale catalogue is too small to be used for this kind of checking.

The only evidence of cross-checking is provided by Phillipps himself. According to him, Rawlinson “sent me the Inscription which Grotefend had had printed from it, & it tallies correctly with it” (15). This appears to mean that Phillipps had checked the Grotefend facsimile against his cylinder and found that they matched. If this is correct, it supports the view that the Bellino/Grotefend facsimile was indeed made from the Phillipps Cylinder. The alternative explanation is that there were two cylinders with identical texts – one of which disappeared some time after Bellino’s transcription was made, allowing his facsimile to become attributed to a different cylinder with the same text. But this seems improbable, especially since there is no other known copy of this text today.

If Bellino’s transcription was indeed made from the Phillipps Cylinder, then Mignan’s story of his discovery is probably untrue. The alternative – that the cylinder described in Mignan’s book is not the one he sold to Phillipps two years later – is very unlikely. The descriptions in Mignan’s book and in his handwritten list are very similar, strongly suggesting that they refer to the same cylinder. Perhaps Mignan simply took the cylinder from the Catholic-Armenian Vicar-General of Ispahan, with or without his permission, and brought it back to England to sell to Phillipps.

Their subsequent dealings were fraught with difficulty. Mignan seems to have returned to Babylon in 1830, at Phillipps’ request, with the aim of finding more objects of a similar type. But Phillipps was dissatisfied with the results, rejecting a set of cuneiform bricks as too damaged and worn, and a set of small cylinders and cornelians as not what he wanted, though valuable. He refused to pay, and a box of artefacts remained in storage at the British Museum, in the care of Sir Frederic Madden. In 1838, Mignan brought a writ against Phillipps for failing to pay both the costs of his trip and the price of the artefacts (16). It is hard to judge whether Phillipps was simply being difficult and contrary, or whether Mignan was a rather unscrupulous opportunist, trying to unload inferior or unwanted pieces on to a wealthy collector.

If the origins of the Phillipps Cylinder remain something of a mystery, its present whereabouts are not well-documented. A systematic search through the CDLI (Cuneiform Digital Library Initiative) database reveals no known Neo-Babylonian royal cylinders which match the Phillipps Cylinder for number of columns, dimensions, text and provenance (17). A three-column cylinder now in the Israel Museum in Jerusalem (CDLI no. P429992), as the result of a donation in the early 1970s, is similar in size to the Phillipps cylinder but has a different text (18). A three-column cylinder now in the Walters Arts Museum in Baltimore (WAM 48.1800) is also similar in size to the Phillipps Cylinder, but has fewer lines (138 as opposed to 171) and was bought by the Museum from Mrs Henry Walters in 1941. Another three-column cylinder of Nebuchadnezzar, privately owned, is poorly documented, but the CDLI record (Anonymous 480739) does include a photograph. The shape of this cylinder is clearly different from the shape of the Phillipps cylinder. None of the other surviving cylinders listed in CDLI resemble the Phillipps Cylinder.

According to Munby, it was bought from the Robinsons by Dr Martin Bodmer, the well-known Swiss collector, through the bookseller Heinrich Eisemann (19). Munby does not give his source for this statement. Only one Babylonian cylinder from the Bodmer Collection is listed in CDLI (FMB 000; CDLI no. P427638). It has two columns and therefore cannot be the Phillipps Cylinder (20).

Direct contact with staff at the Fondation Martin Bodmer has confirmed that the Philllipps cylinder is indeed still part of the Bodmer collection and is displayed as one of the first items in the permanent exhibition of the Fondation’s public museum in Cologny, Switzerland. There are no images or documentation relating to the cylinder on the Web site of the Fondation. Modern catalogues of Neo-Babylonian inscriptions appear to be unaware of its present location.

The Phillipps Cylinder “ranks high among the great records of Babylonian antiquity” (21). It would be fitting for a scholarly description and images from its present custodians to be made available on the Web.

It is still possible to buy cylinders of this kind. A cuneiform cylinder of Nebuchadnezzar II was sold at auction as recently as April 2014, fetching $605,000. This measured 8¼ inches in length, with two columns of text (22). Another cylinder was advertised for sale in 2015 on AbeBooks for $1.75 million, by an antiquarian bookstore in Portsmouth, New Hampshire (23). This was described as “8¼ inches high” and containing the “Royal Proclamation of his re-building-to-perfection efforts of the Temple E-barra/E-ulla at Sippar (in ancient country of Babylonia)”.

  1. Phillipps, Sir Thomas, Catalogus Librorum Manuscriptorum in Bibliotheca Phillippica, facsim. ed. (S.l.: Orksey-Johnson, 2001), p. 54
  2. Munby, A.N.L., Phillipps Studies 3 (Cambridge: University Press, 1954), p. 56
  3. Oxford, Bodleian Library, Phillipps-Robinson Manuscripts, d.291, f. 12
  4. Mignan, Robert, Travels in Chaldaea (London: Colburn and Bentley, 1829), pp. 228-9.
  5. William H. Robinson Ltd, Catalogue 77: A Selection of Extremely Rare and Important Printed Books and Ancient Manuscripts (London, 1948), lot 127, pp. 132-4
  6. Grotefend, G.F. “Die Erbauer der Paläste in Khorsabad und Kujjundshik: Zweiter Nachtag zu den Bemerkungen über ein ninivitisches Thongefäss”, Abhandlungen der Historisch-Philologischen Classe der Königlichen Gesellschaft der Wissenschaften in Göttingen 4 (1850) 201-206
  7. Rawlinson, H.C., The Cuneiform Inscriptions of Western Asia, Vol. I: A Selection from the Historical Inscriptions of Chaldea, Assyria, & Babylonia (London, 1861), plates 65 and 66.
  8. London, British Library – Fox Talbot Collection, Acc 20590 (Phillipps to Talbot, 31 July 1856)
  9. Oxford, Bodleian Library, MS. Phillipps-Robinson e. 389 f. 12-14 (Phillipps to Talbot, 24 October 1856)
  11. Talbot, H.F. “Translation of an inscription of Nebuchadnezzar”, Transactions of the Royal Society of Literature, series 2 vol. 7 (1862) 341-375
  12. Oxford, Bodleian Library, MS. Phillipps-Robinson e. 389 f. 24v-25 (Phillipps to Talbot, 8 December 1856)
  13. Harvard University, Houghton Library, Mrs. Guppy’s photographs of charters, seals, & antiquities at Middle Hill, Phillipps MS 20976 (Houghton, Horblit TypPh Album 30)
  14. Langdon, Stephen, Die neubabylonischen Königsinschriften (Leipzig, J. C. Hinrichs, 1912), Nr. 9, p. 19-20, 88-95 (transcription and German translation); Berger, Paul-Richard, Die neubabylonischen Königsinschriften: Königsinschriften des ausgehenden babylonischen Reiches, 626-539 a. Chr. (Kevelaer: Verlag Butzon & Bercker, 1973), pp. 287-8 (Nbk Zyl. III, 4); Da Riva, Rocío, The Neo-Babylonian Royal Inscriptions: an Introduction (Münster: Ugarit-Verlag, 2008), p. 121 (Text C34)
  15. Oxford, Bodleian Library, MS. Phillipps-Robinson e. 389 f. 12-14 (Phillipps to Talbot, 24 October 1856)
  16. Oxford, Bodleian Library, MS. Phillipps-Robinson d.285, f. 95-107
  17. Cuneiform Digital Library Initiative:
  18. Artzi , Pinhas, “A Barrel Cylinder of Nebuchadnezzar II, King of Babylon” Israel Museum News 10 (1975), 49-51
  19. Munby, A.N.L., Phillipps Studies 5 (Cambridge: University Press, 1960), p. 108
  20. Da Riva, Rocío, “Nebuchadnezzar II’s Prism (EŞ 7834): a new edition,” Zeitschrift für Assyriologie und Vorderasiatische Archäologie 103 (2013), 196-229 (at 221)
  21. William H. Robinson Ltd, Catalogue 77: A Selection of Extremely Rare and Important Printed Books and Ancient Manuscripts (London, 1948), lot 127, p. 132

Towards Unique Identifiers for Medieval and Renaissance Manuscripts

At the recent Schoenberg Symposium, I suggested that we need a unique identifying system for medieval and Renaissance manuscripts. We need this for two main reasons: to overcome the difficulties inherent in current identification methods, and to ensure that manuscript information can be incorporated into the world of Linked Data.

Current scholarly practice is to cite manuscripts by their present location, institution and shelf-mark. So the Beowulf manuscript should be cited as London, British Library, Cotton Vitellius A XV and the Codex Sinaiticus as London, British Library, Add. 43725. This approach underlies the manuscript indexes of the journal Scriptorium.

As several people at the Schoenberg Symposium were quick to point out, this approach is full of difficulties:

  • Shelf-marks, even at the same institution, change over time. So, for example, the manuscript now referred to as “BnF Latin 9” was previously “Regius 3570”.
  • The names of institutions change over time. The British Library used to be the British Museum; the Pierpont Morgan Library is now the Morgan Library and Museum.
  • Some institutions do not give their manuscripts unique, citable shelf-marks. Alternatives might include a Dewey Decimal classification number, or a generic shelf location.
  • Manuscripts move between different institutions, even today. A move of this kind renders previous citations obsolete.
  • The format of these kinds of shelf-marks is vulnerable to mis-spellings and to numerous variations and inconsistencies. Is it BL or British Library? Add. or Additional?
  • Even if the shelf-marks are unique and consistent, they may not have stable URL equivalents. The State Library of Victoria’s manuscripts, for example, have “handle” URLs for their digitized versions, but not for their catalogue records.

In the Phillipps project, I am fortunate that the manuscripts have their own system of identifiers, which is not tied to their current institutional location. Sir Thomas Phillipps gave his manuscripts individual numbers, which are widely quoted in library catalogue records and in booksellers’ and dealers’ catalogues. The numbers were usually marked on the manuscripts themselves, and have survived the various changes of ownership since the dispersal of the Phillipps Collection.

For my purposes, the Phillipps numbers appear to be sufficiently unique to serve as identifiers. But even these numbers have their problems:

  • A single manuscript may have more than one Phillipps number. The University of Western Australia’s copy of Virgil’s Aeneid was recorded twice in Phillipps’ catalogue (in error), and therefore has the numbers 988 and 2878.
  • The same Phillipps number may have been assigned to more than one manuscript. This is evident in the hand-written supplementary list of manuscripts 23,838 to 26,365, held in the Grolier Club’s Library, where many titles have been crossed out and the numbers re-used for different manuscripts.
  • The Phillipps number may have been recorded incorrectly in subsequent indexes and catalogues. The British Library’s card index to the provenance of Phillipps manuscripts, for example, ends with manuscript number 74,539, which is a simple transcription error for 24,539.
  • There are numerous Phillipps manuscripts which never received a Phillipps number. His printed catalogue finishes at 23,837; Edward Bond’s handwritten supplementary list finishes at 26,179 in one version and 26,365 in another. Thomas Fitzroy Fenwick continued the numbering up to 38,628, though his list has not survived. Munby estimated up to 60,000 manuscripts in all. Unnumbered Phillipps manuscripts are still advertised for sale through sites like AbeBooks, even today.

My proposal is for a unique identifier which conforms to the Uniform Resource Identifier (URI) model used in the world of Linked Data.

Best practice for minting and structuring these URIs is described in the document “Cool URIs for the Semantic Web”, produced by the WorldWideWeb Consortium (W3C). An example of their implementation is given by Linked Data Finland. Some background information can also be found in Phil Archer’s “Study on Persistent URIs”, prepared for the European Commission in 2012.

  • This kind of identifier does not need to conform to (or incorporate) any current or past shelf-marks.
  • Individual codices would have their own URIs.
  • Multi-volume codices could be given a single URI, with subsidiary URIs for each volume.
  • Fragments which were formerly part of a codex could be treated like this: if an item can be (or has been) catalogued individually by the current institution, then it should have its own URI.
  • Individual documents would have their own URI.

Current catalogue records could be used as a starting-point. Each current catalogue record for a manuscript could be regarded as an entity which needs a URI. A basic initial approach might be as follows:

  • Create a URI for each individual manuscript codex currently held and catalogued in a public collection.
  • Create a URI for each document individually catalogued in a public collection.
  • Map current and past shelf-marks to the URI.
  • Map current and past catalogue records to the URI.

Subsequent use cases would include the following:

  • Manuscripts which are now dispersed or fragmented could be virtually re-united by creating an additional URI for the original manuscript and creating relationships between this URI and the URIs for each current fragment.
  • Previously separate manuscripts which are now combined into a single volume could be virtually dis-bound by creating additional URIs for each former manuscript and creating relationships between these URIs and the URI for the current codex.
  • Information from different sources about the same manuscript could be linked by matching disparate data to the same URI.

I am not proposing a unified central catalogue of manuscripts, in which full descriptions would be normalized to an agreed metadata schema. Instead, an identifier service would provide a crucial structural element which could be used as the basis for future aggregations of data relating to manuscripts. The service would need to incorporate some minimal descriptive information about the manuscript referred to by each URI: a shelf-mark and institution, at the very least, preferably accompanied by a title (conventional or bibliographical).

The technical aspects of this proposal are one issue. Even more crucial, though, are the politics and funding involved in setting up a service to mint, manage and distribute such URIs. In the book world, much of the impetus for ISBNs and ISSNs (and their predecessors) came from the book trade, which could see a clear commercial advantage in unique numbering systems. In the wider world of Linked Data, various URI services for personal names (like VIAF, ISNI and ORCID) have been developed by consortia and co-operatives in the world of libraries and publishing.

A manuscript identifier service, in contrast, has less commercial value. It will take a combination of libraries and researchers – and possibly publishers – to develop, implement and fund such a service. Some of the key benefits and justifications will be:

  • A framework like this is necessary for any global or international integrated system related to manuscripts.
  • It can overcome the fragmentary nature of the many manuscript databases now in existence, and help to link the proliferating collections of digitized manuscripts.
  • There are huge benefits for researchers in being able to find manuscripts – and information about them – much more quickly and reliably, as well as being able to cite manuscripts more effectively and unambiguously in their own research.
  • There are significant benefits for libraries in promoting their manuscripts, building links to scholarship based on their manuscripts, and connecting their manuscripts to other manuscripts held elsewhere.

There are several existing initiatives working towards unique identifiers for manuscripts. [1]

These identifiers have also been adopted by Diktyon, the “digital network for Greek manuscripts”:

This identifier does not necessarily equate to a single manuscript codex (or even one manuscript in multiple volumes). The URL represents three manuscripts owned by the Library Company of Philadelphia, which also have individual catalogue records and identifiers.

The Trismegistos number (TM_id) maps between (1) publication identifiers (especially sigla), (2) collection inventory numbers (i.e. equivalent of shelf-marks) and (3) conventional names like “the Rosetta Stone”.

These numbers are used solely within the context of the Trismegistos database. They are not expressed as Linked Data URIs, though they do have stable URLs.

  • The Europeana digital library aggregates metadata about digitized objects from many European cultural institutions:

It includes URIs for each object, created and structured in accordance with the framework of the W3C.

A version of the Europeana Data Model specifically for hand-written manuscripts has been developed by the DM2E (Digitized Manuscripts to Europeana) project (2012-2015).

While Europeana contains records for a significant number of medieval and early modern manuscripts, it is impossible to estimate how many. Its scope is European, not global, and it excludes manuscripts which have not been digitized.

Developing and hosting a manuscript identifier service will require a partnership between interested organizations in Europe and North America. These will need to include library consortia and researchers’ associations. Some possibilities might include CERL, LIBER, the Medieval Academy, the Renaissance Society of America and CARMEN. Specialist publishers like Brepols and Brill could also be involved.

Funding will also have to be raised. Some possible sources might include infrastructure funding programmes like the European Union’s Horizon 2020, and foundations like the Mellon Foundation.

Without such a service, medieval and Renaissance manuscripts are likely to miss out on the benefits to be gained from the world of Linked Data. Databases will remain dispersed and fragmented, digital resources will be difficult to locate, and citations will continue to be inconsistent and confusing. A unique identifier service is the key to linking and joining up all these resources. It will dramatically increase the efficiency, richness and interconnectedness of the manuscript digital ecosystem, to the benefit of researchers and cultural heritage institutions alike.

[1] My thanks to Cillian O’Hogan, Carrie Schroeder and Matthieu Cassin for these suggestions (via Twitter).

Saving the Spanish Armada from the grocers: Phillipps MS 25342 (Tales of the Phillipps Manuscripts #3)

What drove Sir Thomas Phillipps in his pursuit of the biggest private collection of manuscripts ever assembled? Like many fanatical collectors, he seems to have found it hard to explain his obsession. But one important motive was undoubtedly to save ancient manuscripts from destruction.

In an unpublished draft (c.1828) for the preface to his catalogue of his collection, he wrote:

I was instigated by reading various accounts of the destruction of valuable MSS… My chief desire for preserving Vellum MSS. arose from witnessing the unceasing destruction of them by Goldbeaters; My search for charters or deeds by their destruction in the shops of Glue-makers and Taylors. [1]

A fascinating example of this is a set of Spanish naval documents now in the National Maritime Museum at Greenwich. They form part of four large collections of naval papers sold to the Museum in 1946 by the Robinson brothers, who had recently purchased the “residue” of the Phillipps manuscripts from the Trustees. The Maritime Museum paid a total of £22,000 for this remarkable set of documents, which include papers relating to Samuel Pepys, Sir Robert Cotton, Admiral Benbow and Lord Nelson, amongst others. [2]

The Spanish documents are described as “a large vellum-bound volume of Spanish diplomatic papers, mainly dating between 1603 and 1672, but with a section dealing with the Armada, 1587 to 1588”. They were probably once owned by the Irish antiquarian and collector Lord Kingsborough (1795-1837). His great work, The Antiquities of Mexico, contained facsimiles of various Mesoamerican codices, and was intended to demonstrate that the indigenous peoples of Mexico were descended from one of the Lost Tribes of Israel. Kingsborough’s manuscripts were offered for sale in Dublin on 1 November 1842, by the bookseller Charles Sharpe. While Phillipps himself did not buy at this sale, he subsequently acquired a number of the Kingsborough manuscripts from other sources, including booksellers like Thomas Rodd.

Phillipps may also have acquired some manuscripts from the London bookseller Obadiah Rich (1783-1850). Rich was the American Consul at Port Mahon in Menorca, and supplied various Spanish manuscripts to Kingsborough. In a letter to Phillipps, dated 20 November 1843, Rich gives a vivid description of his experiences in acquiring old documents in Madrid:

“More MSS. are destroyed by ignorant people, than by civil wars. – I once found a bookseller at Madrid occupied in taking off the parchment covers from a large pile of old folios and throwing the inside into his cellar to sell by weight to the grocers: I opened one, and immediately bought the whole (120 volumes) at about 2s. per vol: you will hardly believe that among them was one of the most precious volumes in your collection relating to England of the time of Philip the second!”[3]

He is almost certainly referring to the volume of Spanish documents now in the National Maritime Museum (formerly Phillipps 25342). This volume actually includes the instructions given by Philip II of Spain to the Duke of Medina Sidonia, the commander of the Spanish Armada – in the King’s own handwriting.

It’s sobering to think that these documents nearly ended their days as wrapping for someone’s groceries in 19th-century Madrid! Instead, through the persistence of collectors like Kingsborough and Phillipps (and their agents like Rich), these unique papers have survived to bear witness to the events of their time.

[1] Phillipps, Sir Thomas, The Phillipps Manuscripts: Catalogus Librorum Manuscriptorum in Bibliotheca D. Thomae Phillipps, Bt., facsim. ed. (S.l.: Orskey-Johnson, 2001), quoted in the Introduction by A.N.L. Munby, p. [2]


[3] A.N.L. Munby, Phillipps Studies (Cambridge, 1951-60), vol. IV, pp. 13-14

Metadata, data, content: there’s no real difference in the humanities

“Metadata” is the subject of an important public debate in Australia at the moment. Much of the argument is around the relationship between metadata and content, or metadata and data. Several bloggers have observed that the distinction commonly made between metadata and content is a misleading one.

In the library world, the term “metadata” became popular as a way of making stodgy older terms like “catalogue entries” or “catalogue records” sound more relevant and up-to-date. It sounds so much more sophisticated to be teaching “metadata creation” in library school, rather than “cataloguing”.

The justification is that these catalogue entries are data about data, where the underlying content is the books or other library resources being described in the record. Catalogues of digital objects in services like Europeana are conceptualized in the same way. The descriptive information is regarded as metadata, containing pointers to the object described.

In the world of scientific and medical research, the same model is now ubiquitous. Instruments, equipment and observations produce “data” which take the form of digital files of various kinds. These files are given structured descriptions, which are generally called “metadata”.

A typical example is the marine and climate data collected by the Integrated Marine Observing System (IMOS) and made available through the Australian Ocean Data Network (AODN). AODN is built around a standardized “metadata catalogue” which gives access to the data.

This model may reflect the current consensus of a scientific community about ways of describing and interpreting the phenomena being studied. But it does not work well for the humanities. Critical analysis of the words we use when describing something – and of the characteristics we select as being significant – is at the heart of humanities discourse. “Data” and “metadata” cannot be easily separated when description and interpretation become contested areas. 

In the HuNI (Humanities Networked Infrastructure) project, we have tried to collapse this distinction. The primary content in HuNI consists of entities (People, Places, Organisations, Events, Works and Concepts). They are derived from thirty contributing datasets, and can be combined, annotated and linked according to a researcher’s interests. The entity records in HuNI are data, metadata and content – all at the same time.

In the Phillipps project, my focus is on provenance events relating to the manuscripts which passed through the Phillipps collection. These events are described in the sales catalogues of book dealers like Sotheby’s and in the printed catalogues of institutional or personal collections, especially Phillipps’ own printed catalogue. They also appear as records in a range of general library databases, and in specialist bibliographical databases like the Schoenberg Database of Manuscripts.

From a conventional point of view, these catalogue entries and records can be described as “metadata”. After all, they describe the manuscripts and their contents. But for me the descriptions are the content. There is no distinction between “metadata” and “data” here.

These catalogue entries are evidence of specific events and activities over hundreds of years. But they are much more than that. They also provide evidence of the different understandings of the nature of these manuscripts over time. The catalogue entries themselves are historically conditioned interpretations, reflecting the perceptions of the people who compiled them.

To structure and organize these entries, I’ve created a data model– or should that be a metadata schema? The distinction between the two seems increasingly arcane and irrelevant. The entries are being reformatted as information on spreadsheets and in graph databases, where they can be analysed and visualized. It’s all content, and it’s all data – and it’s all metadata too!

The OED’s definition of “metadata” is a circular one: “a set of data that describes and gives information about other data”. So metadata are data, and both provide content. In the humanities – and in the sciences too – we need to be wary of simplistic models and lazy terminology in this area. The current Australian controversy is an important reminder of this.