Monthly Archives: August 2014

Metadata, data, content: there’s no real difference in the humanities

“Metadata” is the subject of an important public debate in Australia at the moment. Much of the argument is around the relationship between metadata and content, or metadata and data. Several bloggers have observed that the distinction commonly made between metadata and content is a misleading one.

In the library world, the term “metadata” became popular as a way of making stodgy older terms like “catalogue entries” or “catalogue records” sound more relevant and up-to-date. It sounds so much more sophisticated to be teaching “metadata creation” in library school, rather than “cataloguing”.

The justification is that these catalogue entries are data about data, where the underlying content is the books or other library resources being described in the record. Catalogues of digital objects in services like Europeana are conceptualized in the same way. The descriptive information is regarded as metadata, containing pointers to the object described.

In the world of scientific and medical research, the same model is now ubiquitous. Instruments, equipment and observations produce “data” which take the form of digital files of various kinds. These files are given structured descriptions, which are generally called “metadata”.

A typical example is the marine and climate data collected by the Integrated Marine Observing System (IMOS) and made available through the Australian Ocean Data Network (AODN). AODN is built around a standardized “metadata catalogue” which gives access to the data.

This model may reflect the current consensus of a scientific community about ways of describing and interpreting the phenomena being studied. But it does not work well for the humanities. Critical analysis of the words we use when describing something – and of the characteristics we select as being significant – is at the heart of humanities discourse. “Data” and “metadata” cannot be easily separated when description and interpretation become contested areas. 

In the HuNI (Humanities Networked Infrastructure) project, we have tried to collapse this distinction. The primary content in HuNI consists of entities (People, Places, Organisations, Events, Works and Concepts). They are derived from thirty contributing datasets, and can be combined, annotated and linked according to a researcher’s interests. The entity records in HuNI are data, metadata and content – all at the same time.

In the Phillipps project, my focus is on provenance events relating to the manuscripts which passed through the Phillipps collection. These events are described in the sales catalogues of book dealers like Sotheby’s and in the printed catalogues of institutional or personal collections, especially Phillipps’ own printed catalogue. They also appear as records in a range of general library databases, and in specialist bibliographical databases like the Schoenberg Database of Manuscripts.

From a conventional point of view, these catalogue entries and records can be described as “metadata”. After all, they describe the manuscripts and their contents. But for me the descriptions are the content. There is no distinction between “metadata” and “data” here.

These catalogue entries are evidence of specific events and activities over hundreds of years. But they are much more than that. They also provide evidence of the different understandings of the nature of these manuscripts over time. The catalogue entries themselves are historically conditioned interpretations, reflecting the perceptions of the people who compiled them.

To structure and organize these entries, I’ve created a data model– or should that be a metadata schema? The distinction between the two seems increasingly arcane and irrelevant. The entries are being reformatted as information on spreadsheets and in graph databases, where they can be analysed and visualized. It’s all content, and it’s all data – and it’s all metadata too!

The OED’s definition of “metadata” is a circular one: “a set of data that describes and gives information about other data”. So metadata are data, and both provide content. In the humanities – and in the sciences too – we need to be wary of simplistic models and lazy terminology in this area. The current Australian controversy is an important reminder of this.

Advertisements