About Me

My photo
I am Professor of Digital Humanities at the University of Glasgow and Theme Leader Fellow for the 'Digital Transformations' strategic theme of the Arts and Humanities Research Council. I tweet as @ajprescott.

This blog is a riff on digital humanities. A riff is a repeated phrase in music, used by analogy to describe a improvisation or commentary. In the 16th century, the word 'riff' meant a rift; Speed describes riffs in the earth shooting out flames. The poet Jeffrey Robinson points out that riff perhaps derives from riffle, to make rough.

Maybe we need to explore these other meanings of riff in thinking about digital humanities, and seek out rough and broken ground in the digital terrain.

13 January 2013

The Deceptions of Data

At Friday's conference organised by the estimable Orietta da Rold at the University of Leicester to mark the launch of the interesting Manuscripts Online project, I was telling a story of a nineteenth-century facsimile of a small late fourteenth-century manuscript in the British Library. The facsimile is a beautiful piece of craftsmanship, even down to the meticulous reproduction of the manuscript's binding. It is so beautifully done that it at first seems that somebody has stolen the original volume from the British Library. However, when you examine the facsimile more closely, you realise that, despite all the care lavished on ensuring that it reproduces the original manuscript as closely as possible, the facsimile has a major flaw. There are two very scrappy flyleaves in the manuscript, dirty and stained with old glue from an earlier binding. They initially appear completely insignificant, and the makers of the facsimile decided they could be left out. However, these scrappy flyleaves include some signatures which provide important information about early ownership of the manuscript.

This carefully constructed facsimile is deceptive. Editorial decisions were made in its construction which mean that it represents a subjective view of the manuscript. In the late nineteenth century, the great manuscript scholar Edward Maunde Thompson, in introducing a photographic facsimile of a manuscript, declared that photographs offered a depiction of the manuscript unaffected by human intervention. Sadly, this is never the case. Every reproduction of a manuscript involves a mass of human decisions about (for example) what lighting method is used, how flyleaves, blank leaves, etc., are included, how the binding is represented, and so on, which means that each reproduction of the manuscript represents an interpretation, frequently drawing on a mixture of curatorial, photographic and academic expertise. This applies just as much (perhaps even more) with digital imaging as with conventional photographic imaging.

During the Manuscripts Online event on Friday, we heard a great deal about data, to the point where data seemed to assume a life of its own, an energetic effervescent life force that needed to be freed so that medieval and other forms of scholarship could be transformed into new 'cool' forms by its remarkable qualities. It seemed that, for the participants in the conference, we are no longer curators or scholars but makers and consumers of data. In this perspective, data is presented as in some way offering a more objective, less problematic view of historic cultures and societies than the archives or manuscripts from which it is drawn.

There are many well-known projects and demonstrations which illustrate the way in which data is being manipulated to transform our views of historic and cultural trends and developments. Mapping has become ubiquitous, as if cultural geographers had never taught us about the way in which maps are difficult and challenging cultural constructs.  It is intended that Medieval Manuscripts Online will include a mapping component, and mapping has become almost a standard requirement of such products nowadays. A good example of a characteristic approach is the visualisation by Ben Schmidt of seasonal movements of shipping during the late eighteenth century. This uses information from the log books of ships assembled for the climatological database. It all looks very entrancing and convincing, and we are reassured that the visualisation is based on 'hundreds of log books' so we assume it gives a good sense of major trade connections from 1750-1850. James Cheshire has also used the same data to produce a map of British trade routes from 1750 to 1800.

However, for a scholar interested in Britain, one thing immediately catches the eye. There are virtually no traces from the east coast of Britain, and very little trade between England and Scandinavia - how could great ports like Hull not be on trade routes? Moreover, no trade is shown as emanating from Liverpool or Glasgow - two of Britain's greatest slaving ports in the eighteenth century. How can this not be shown? The answer is that the climatological database was based on log books from Royal Naval ships and ships of the East India Company. The Royal Navy mainly operated from the south of England and in any case didn't engage directly in trade, while the East India Company of course did not trade much with northern Europe. This is visualisation shows some trade routes in the eighteenth century, but by no means all. Indeed, it is possible that some of these are not trade routes at all, since it is possible that the Royal Navy did not necessarily follow just trade routes. In removing the data sets from their original context, this visualisation runs the risk of creating a seriously distorted impression. For the original purpose of analysing eighteenth-century temperatures, the database was well constructed - it didn't particularly matter whether the vessels were naval or not, the weather was still the same. Using naval data to represent trade is another matter, however.

One of the problem confronting data enthusiasts in the humanities is that we feel a need to convince our more old-fashioned colleagues about what can be done. But our role as advocates as data shouldn't mean that we lose our critical sense as scholars. In their presentations to the Manuscripts Online conference, Michael Pidd and Kathy Rogers from the Humanitie Research Institute at Sheffield stressed the need for detailed and careful examination of datasets in making them available, but there is a risk that we look more carefully at the technical components of the datasets than the historical context of the information that they represent. This is a danger of which Ben Schmidt was aware in making the shipping visualisation - he observes that one of the many uses of the visualisation is that it shows the status and coverage of the climatological database.  It may be that this proves to be one of the main uses of data  visualisation, namely giving us an innovative way of analysing the structure of historical and other texts, as Tim Hitchcock and Bill Turkel have shown in the way in which they use visualisations to explore the structure of the Old Bailey Proceedings as a historical source.

It comes back to the notorious comment made in a New York Times article on the Digital Humanities a couple of years ago, that data provides a way by which humanities scholars can escape from the '-isms' of cultural theory. There appears to be a sense that data can somehow be cut free from its historical moorings to enjoy an autonomous existence. I think that's very dangerous. Data doesn't mean we become less critical; it demands that our critical faculties are sharper than ever, as the distortions and deceptions of data can be so deeply embedded that they are difficult to ferret out. As data becomes more promiscuous and greater cross-connections are made, our critical faculties need to be sharper than ever.       


  • Anonymous says:
    15 January 2013 at 01:51

    Thanks for this thoughtful piece. Its warnings and caveats are timely indeed, and I hope it's widely read.


  • Anonymous says:
    8 February 2013 at 10:23

    Excellent article. Objects are never just objects, despite our tendencies to objectify.

Post a Comment