About Me

My photo
I am Professor of Digital Humanities at the University of Glasgow and Theme Leader Fellow for the 'Digital Transformations' strategic theme of the Arts and Humanities Research Council. I tweet as @ajprescott.

This blog is a riff on digital humanities. A riff is a repeated phrase in music, used by analogy to describe a improvisation or commentary. In the 16th century, the word 'riff' meant a rift; Speed describes riffs in the earth shooting out flames. The poet Jeffrey Robinson points out that riff perhaps derives from riffle, to make rough.

Maybe we need to explore these other meanings of riff in thinking about digital humanities, and seek out rough and broken ground in the digital terrain.

13 January 2013

The Deceptions of Data



At Friday's conference organised by the estimable Orietta da Rold at the University of Leicester to mark the launch of the interesting Manuscripts Online project, I was telling a story of a nineteenth-century facsimile of a small late fourteenth-century manuscript in the British Library. The facsimile is a beautiful piece of craftsmanship, even down to the meticulous reproduction of the manuscript's binding. It is so beautifully done that it at first seems that somebody has stolen the original volume from the British Library. However, when you examine the facsimile more closely, you realise that, despite all the care lavished on ensuring that it reproduces the original manuscript as closely as possible, the facsimile has a major flaw. There are two very scrappy flyleaves in the manuscript, dirty and stained with old glue from an earlier binding. They initially appear completely insignificant, and the makers of the facsimile decided they could be left out. However, these scrappy flyleaves include some signatures which provide important information about early ownership of the manuscript.

This carefully constructed facsimile is deceptive. Editorial decisions were made in its construction which mean that it represents a subjective view of the manuscript. In the late nineteenth century, the great manuscript scholar Edward Maunde Thompson, in introducing a photographic facsimile of a manuscript, declared that photographs offered a depiction of the manuscript unaffected by human intervention. Sadly, this is never the case. Every reproduction of a manuscript involves a mass of human decisions about (for example) what lighting method is used, how flyleaves, blank leaves, etc., are included, how the binding is represented, and so on, which means that each reproduction of the manuscript represents an interpretation, frequently drawing on a mixture of curatorial, photographic and academic expertise. This applies just as much (perhaps even more) with digital imaging as with conventional photographic imaging.

During the Manuscripts Online event on Friday, we heard a great deal about data, to the point where data seemed to assume a life of its own, an energetic effervescent life force that needed to be freed so that medieval and other forms of scholarship could be transformed into new 'cool' forms by its remarkable qualities. It seemed that, for the participants in the conference, we are no longer curators or scholars but makers and consumers of data. In this perspective, data is presented as in some way offering a more objective, less problematic view of historic cultures and societies than the archives or manuscripts from which it is drawn.

There are many well-known projects and demonstrations which illustrate the way in which data is being manipulated to transform our views of historic and cultural trends and developments. Mapping has become ubiquitous, as if cultural geographers had never taught us about the way in which maps are difficult and challenging cultural constructs.  It is intended that Medieval Manuscripts Online will include a mapping component, and mapping has become almost a standard requirement of such products nowadays. A good example of a characteristic approach is the visualisation by Ben Schmidt of seasonal movements of shipping during the late eighteenth century. This uses information from the log books of ships assembled for the climatological database. It all looks very entrancing and convincing, and we are reassured that the visualisation is based on 'hundreds of log books' so we assume it gives a good sense of major trade connections from 1750-1850. James Cheshire has also used the same data to produce a map of British trade routes from 1750 to 1800.

However, for a scholar interested in Britain, one thing immediately catches the eye. There are virtually no traces from the east coast of Britain, and very little trade between England and Scandinavia - how could great ports like Hull not be on trade routes? Moreover, no trade is shown as emanating from Liverpool or Glasgow - two of Britain's greatest slaving ports in the eighteenth century. How can this not be shown? The answer is that the climatological database was based on log books from Royal Naval ships and ships of the East India Company. The Royal Navy mainly operated from the south of England and in any case didn't engage directly in trade, while the East India Company of course did not trade much with northern Europe. This is visualisation shows some trade routes in the eighteenth century, but by no means all. Indeed, it is possible that some of these are not trade routes at all, since it is possible that the Royal Navy did not necessarily follow just trade routes. In removing the data sets from their original context, this visualisation runs the risk of creating a seriously distorted impression. For the original purpose of analysing eighteenth-century temperatures, the database was well constructed - it didn't particularly matter whether the vessels were naval or not, the weather was still the same. Using naval data to represent trade is another matter, however.

One of the problem confronting data enthusiasts in the humanities is that we feel a need to convince our more old-fashioned colleagues about what can be done. But our role as advocates as data shouldn't mean that we lose our critical sense as scholars. In their presentations to the Manuscripts Online conference, Michael Pidd and Kathy Rogers from the Humanitie Research Institute at Sheffield stressed the need for detailed and careful examination of datasets in making them available, but there is a risk that we look more carefully at the technical components of the datasets than the historical context of the information that they represent. This is a danger of which Ben Schmidt was aware in making the shipping visualisation - he observes that one of the many uses of the visualisation is that it shows the status and coverage of the climatological database.  It may be that this proves to be one of the main uses of data  visualisation, namely giving us an innovative way of analysing the structure of historical and other texts, as Tim Hitchcock and Bill Turkel have shown in the way in which they use visualisations to explore the structure of the Old Bailey Proceedings as a historical source.

It comes back to the notorious comment made in a New York Times article on the Digital Humanities a couple of years ago, that data provides a way by which humanities scholars can escape from the '-isms' of cultural theory. There appears to be a sense that data can somehow be cut free from its historical moorings to enjoy an autonomous existence. I think that's very dangerous. Data doesn't mean we become less critical; it demands that our critical faculties are sharper than ever, as the distortions and deceptions of data can be so deeply embedded that they are difficult to ferret out. As data becomes more promiscuous and greater cross-connections are made, our critical faculties need to be sharper than ever.       
                        

Read more »

11 January 2013

The Function, Structure and Future of Catalogues



This is the text of a keynote lecture to the conference at Leicester University on 11 January 2013 marking the launch of Manuscripts Online: www.manuscriptsonline.org.


THE FUNCTION, STRUCTURE AND FUTURE OF CATALOGUES

The story of the British Library is full of remarkable personalities.  One of the most striking of these was Donald Urquhart, who established in 1961 the National Lending Library for Science and Technology at Boston Spa in Yorkshire, which afterwards became the northern outpost of the British Library. Urquhart was described by his successor Maurice Line as ‘one of the greatest innovators, practitioners, thinkers and personalities the library profession has ever had’. Urquhart was a scientist whose wartime experience made him aware of the inability of staid literary libraries such as the British Museum to satisfy the increasing need of scientific researchers for prompt, easy and cheap access to the burgeoning range of publications reporting the latest technical and scientific research.  At Boston Spa, Urquhart designed and built a remarkable mail order facility for information which would ensure that scientists could receive the articles they needed in their laboratory within twenty four hours. In creating this facility, Urquhart questioned, and frequently rejected, many of the accepted principles of librarianship. His best known innovation was to jettison the idea of a catalogue. When he asked a librarian what the purpose of a catalogue was, he was unimpressed by the reply he received: ‘for completeness’. Urquhart argued that, if books were arranged on the shelf by author and title order, a catalogue was unnecessary. If the book was there, the lending request could be met straight away off the shelf; if the book was not there, then it would be necessary in any case to contact other institutions to see if they have a copy.

Urquhart’s questioning of the principle of a library catalogue may seem to be gaining a new relevance as we see Google and other search engines becoming the primary means by which researchers seek out information. Recent studies, for example, show that students, in seeking electronic resources, do not turn to the catalogues of e-resources laboriously compiled by libraries, but simply Google the resource. Library catalogues have been criticized as dowdy and lacking in interaction by comparison with (for example) Amazon.  The highly structured and meticulously prepared information in a catalogue looks redundant by comparison with the speed and simplicity of Google.  The catalogue is starting to look in many ways to be exactly what Urquhart suggested – a comfort blanket for librarians and curators. It seems that some librarians themselves are also coming to such a view.  Deanna Markum of the Library of Congress commented in 2006 that: ‘the detailed attention that we have been paying to descriptive cataloging may no longer be justified ... retooled catalogers could give more time to authority control, subject analysis, [and] resource identification and evaluation’.  Likewise, Karen Calhoun, in a report commissioned by the Library of Congress expressed a concern that ‘The existing local catalog's market position has eroded to the point where there is real concern for its ability to weather the competition for information seekers' attention’.

Yet the humble catalogue also underpins many aspects of the new digital services by which it seems threatened. Two of the major library digitization projects of recent years, Early English Books Online and Eighteenth Century Collections Online, stem directly from the largest modern cataloguing project of recent times, the English Short Title Catalogue, and the primacy of EEBO and ECCO as digitisation projects reflects the visionary insistence of those who established the English Short Title Catalogue in the 1970s that it should be in machine readable form. While Amazon may have given a lead in promoting a more interactive approach to identifying and using books, the comprehensiveness of Amazon database is due to the fact that it incorporates the historic catalogues of major libraries such as the Library of Congress and the British Library.  Anyone who feels that Google can do the job performed by library catalogues should attempt to locate specific volumes of periodicals in Google Book. It is an extraordinarily time consuming task, and sometimes downright impossible, which explains why digital libraries such as Hathi and Open Library offer conventional online catalogue access to digital libraries.

Library, archive and museum catalogues offer some of the largest and most highly structured datasets which humanities researchers are likely to encounter. These bibliographical datasets are increasingly being made available as open data. The British Museum’s collection database is now available in this form and the British Library has also made the British National Bibliography available as linked open data. The highly structured data in library catalogues has great potential to support innovative visualisations showing aspects of bibliographic and intellectual history, as can be seen from this project at St Andrews, the Bohemian Bookshelf.  While these possibilities have lead to an increased interest in the potential of using catalogue data in new ways, this renaissance of interest in the catalogue comes at a time when the catalogue itself is fundamentally changing because the services it has traditionally supported are also being transformed. As Lorcan Dempsey has commented, ‘the catalog is being reconfigured in ways which may result in its disappearance as an individually identifiable component of library service. It is being subsumed within larger library discovery environments and catalog data is flowing into other systems and services’.

The catalogue is one of the oldest and most important means by which humans have sought to control information. The library of clay tablets collected by King Ashurbanipal of Assyria in the 7th century BC had an author and title catalogue and probably a class catalogue as well. We will all be familiar with the corpus of British Medieval Library catalogues which has been in the process of publication by the British Academy under the general editorship of Richard Sharpe and which lists thousands of texts in circulation in medieval Britain. The production of library catalogues was one of the first fields in which automation was used to expedite the management of information. One of the earliest applications of automated duplicating devices was in the production of the British Museum’s library catalogue. The card index may nowadays seen like a very humdrum instrument of information technology, but it was revolutionary in the way in which the use of standardized cards allowed the sharing of information. The Library of Congress in the early part of the twentieth century operated a bibliographic service which offered pre-printed catalogue cards for books to local libraries. The automation of these card indexes was one of the first computing technologies to impact on humanities research.

The scholarly literature on cataloguing is considerable, and the changes in the position of the catalogue mean that discussion as to its purpose, value and future remains vigorous.  This extensive scholarly and professional debate has helped encourage the establishment and continued development of new cataloguing standards. Not surprisingly, the discussion of cataloging is most sophisticated for such conventional,library materials as the printed book and the periodical publication. As early as the seventeenth century, Thomas Bodley debated with his librarian Richard James how the books he purchased should be described. The Keeper of Printed Books at the British Museum, Anthony Panizzi, established the first modern set of rules for cataloguing books in 1841. The ninety one rules promulgated by the British Museum reflected the collective wisdom of Panizzi and his assistants, their debates about points of cataloguing practice often extending far into the night. The British Museum’s example encouraged American librarians to produce their own rules, culminating in Charles Ammi Cutter’s Rules for a Dictionary Catalog of 1876.  The formation of professional Library Associations in Britain and America encouraged further collaboration, resulting in the compilation of an Anglo-American Code in 1908 and finally the issue of the second edition of the Anglo-American Cataloguing Rules (AACR2) in 1967, which were further revised in 1978. The experience of the Library of Congress in producing catalogue cards for use by other libraries encouraged early experiments with distributing library catalogue records in machine readable forms. The Library of Congress developed a service to produce and distribute on tape Machine Readable Catalogue entries as early as 1966. This international co-operation of course extends beyond the English speaking world. The International Federation of Library Associations has been very important here, for example, in enunciating the International Principles for Bibliographic Description in 1961. This framework has provided a strong basis for addressing new challenges. The new version of the Principles for Bibliographic Description, which you can see here, attempts to reconceptualise the role of bibliographic description in new information environments, and reflect the sort of thinking which has underpinned the development of the new Resource Discovery and Access (RDA) which has been implemented by the Library of Congress and is in the process of being adopted by the British Library. Since one of the advantages of RDA is that it is meant to provide a more flexible framework than AACR2 for dealing with archives, manuscripts and other non-book materials, RDA is likely to loom more considerably in the field of manuscript scholars than AACR2 has done.

While the use of ICT in printed book cataloguing has a long history, for archives the development has been much more recent, but very dramatic. Archive processing differs fundamentally from printed book processing because of its concern to preserve and represent the hierarchies and administrative inter-relationships of individual documents. An archival callmark such as this example (National Archives, KB 145/3/5/1) tells me everything I need to know about the document. At the fonds or collection level, it forms part of the records of the law court known as the King’s Bench. At the series level, the number 145 tells me I is part of the series of King’s Bench Recorda files. The sub-series number, 3, indicates that this from the reign of Richard II. The item number, 5, indicates that this file is from the 5th regnal year of Richard II and the file number 1 shows that it is the first of two parts surviving for that year. The concern of archival descriptions is chiefly to preserve and document these hierarchies, as the record entry for this file in the National Archives catalogue illustrates. The kind of codicological and palaeographical information such as the number of membranes or the number of scribes which might be discussed in a literary or liturgical manuscript of the same period is not analysed or recorded here. As you can see, the physical information provided for description of a twelfth-century archival document such as this pipe roll is minimal. The international standard which governs archival processing and description is ISAD(G): the General International Standard Archival Description. By contrast to MARC and printed books, the fonds structures of ISAD(G) cannot easily be represented in a relational database. The hyperlinks of the World Wide Web closely map archival structures, so that very quickly after the web appeared, an XML schema known as EAD (Encoded Archival Description) was produced which enabled archive descriptions to be readily made available for web access. The vast catalogues of the National Archives in London, which had remained until the 1990s in typewritten form and were only made available remotely through the energetic photocopying programme of the List and Index  Society, were rapidly made available online. This was rapidly followed by the Access to Archives programme which converted and put on the web catalogue records from many local and specialist repositories.

There isn’t time here today to go into the interesting development of online cataloguing and inventory methods in museums, but the need for museum documentation to embrace such a wide range of materials led to the emergence of a more semantically-based standard of the CIDOC Conceptual Reference Model, which is I suspect likely to have a very major impact on the way in which we document and analyse cultural heritage materials over the next few years. But what is striking here is the way in which the sort of material in which we are interested – the type of medieval literary, liturgical, legal and other library manuscripts which are the glory of collections such as the British Library, the Bodleian Library and the libraries of the Oxford and Cambridge colleges – has been ignored by developments in cataloguing. The needs of these manuscripts – or indeed of early modern and modern manuscripts which do not fall easily into the fonds structures of ISAD(G) - have barely figured in discussions of the nature and future of the catalogue in new information environments. This is surprising, since the cataloguing of manuscript libraries was one of the earliest forms of library cataloguing. Among the earliest published library catalogues in England were Thomas Smith’s 1696 catalogue of the Cotton Library and David Casley’s 1734 catalogue of the old Royal collection of manuscripts, while Humfrey Wanley set a formidable standard for specialized catalogues with his catalogue of Anglo-Saxon manuscripts published by Hickes in 1705. Edward Bernard’s 1697 Catalogue of manuscript books in England and Ireland was one of the first attempts at a union catalogue. These seventeenth- and eighteenth-century pioneers established a tradition which has without doubt been one of the glories of English medieval scholarship. The catalogues of manuscript collections compiled by scholars such as M. R. James, Neil Ker, Malcolm Parkes, Tilly de la Mare, and Andrew Watson are remarkable achievements and this tradition continues today, and some of its most distinguished practitioners are with us today. Moreover, the various in-house catalogues of manuscript collections compiled by institutions such as the British Library, the Bodleian Library and the John Rylands Library in Manchester incorporate some of the finest work of such manuscript scholars as Edward Maunde Thompson, Sir George Warner, Francis Wormald, Julian Brown, Falconer Madan, Richard Hunt and (in Manchester) Frank Taylor.

The catalogues of British manuscript collections represent a formidable scholarly achievement, but, unlike printed books or archives, this remarkable body of work has failed to generate any reflective or theoretical literature. Manuscript cataloguers have been too deeply steeped in the uncial to consider how the catalogues they produce fit into the wider range of library and archive catalogue provision or to consider how their catalogues can be better suited to their function and purpose.  The contrast has been drawn between France, where Leopold Delisle’s influence was responsible for the early development of a very integrated and consistent approach to manuscript cataloguing. It has been  suggested that the failure of English manuscript libraries and scholars to develop a similar approach was due to a more pragmatic tradition in England – that English scholars were more concerned  with studying the manuscripts than with the way in which the catalogues were structured. I fear this is a rather self-serving piece of justification. I suspect that the failure to develop any theory of manuscript cataloguing in Britain has more to do with the way in which the study of manuscripts has been some intimately connecting with connoisseurship and collecting. Falconer Madan’s discussion of the cataloguing of manuscripts in his 1899 volume Books in Manuscript – amazingly, still one of the best introductions to the subject when I started work in the Department of Manuscripts at the British Library in 1979, but now of course supplanted by more up-to-date treatments by scholars such as Michelle Brown and Christopher de Hamel – makes this concern with the creation of informed connoisseurs clear when he explains that his discussion of cataloguing is aimed at the ‘private collector [who] has purchased a manuscript at a sale, that it has just reached him, and that he is inexperienced in the treatment of such volumes’. 

This tradition rooted in collecting and connoisseurship goes back deep into the history of manuscript scholarship in Britain – one thinks of Wanley’s work on the Harley collection. I suggest that it had a profound effect on the intellectual programme of scholars such as James or Ker. Richard Pfaff has suggested that the aim of M. R. James in compiling his catalogues was to create in his mind a kind of imaginary library which would assist him in dating and placing texts, and a similar sense is also evident in the approach of Neil Ker. This means that for these scholars, the catalogue was a method which gave them a structure for the systematic exploration of manuscript libraries and also became a means of recording and delivering a scholarly judgment on the dating and localization of a particular manuscript. But frequently the relationship of these scholarly catalogues to the libraries they described was not necessarily clear – as is apparent from the problems created by James using his own systems for the numbering of manuscripts. While the documentary scholars at the Public Record Office codified their professional practice to create a new archive profession, with training offered at new schools in centres like University College London and Liverpool, there was no comparable move to create a similar professional basis for manuscript librarianship. Indeed, in creating the archives profession in Britain, Sir Hilary Jenkinson explicitly excluded Departments of Manuscripts like that at the British Museum, arguing that they used museum procedures which caused damage to the fonds. Rather than seeking to create a parallel professional structure to that being established by the archivists, manuscript scholars such as Edward Maunde Thompson, Francis Wormald and Julian Brown concentrated instead on formalizing and developing the academic study of paleography and codicology. While scholars from the Department of Manuscripts such as Thompson and Frederick Kenyon served as Directors of the British Museum and played a major part in museum administration, they had little impact on the development of the new archives profession – something which perhaps confirmed Jenkinson’s argument that the approach of manuscript libraries was too often based on the selective connoisseurship of the museum.

The result of this is that, while the emergence of cataloguing standards for books and archives, was underpinned in Britain by a substantial scholarly literature discussing the function and structure of archives, there is no comparable literature on the theory and practice of manuscript cataloguing. Our essential handbooks, such as the works of Michelle Brown and Christopher de Hamel that I have already mentioned, discuss palaeography, codicology and terminology. They do not discuss the cataloguing requirements of manuscripts. The British literature on this subject is embarrassingly meagre.  The best historical overview is A. J. Piper’s article on ‘Cataloguing British Collections of Medieval Western Manuscripts’ in Lynda Dennison’s collection of the legacy of M. R. James. An important but largely forgotten contribution is an article by the remarkable palaeographer Dorothy Coveney, who produced a groundbreaking catalogue of the manuscripts at University College London in 1935. Coveney’s article on ‘The Cataloguing of Literary Manuscripts’ – literary manuscripts here being adopted as a technical term to distinguish library manuscripts from archives – published in The Journal of Documentation in 1950 argued for much fuller and more systematic palaeographical treatment of manuscripts, making trenchant criticisms of the mannered descriptions of hands in James’s catalogues. Of course, there are descriptions of the methods adopted in the prefaces of catalogues by scholars such as James and Ker and in some library catalogues, such as that of the Bodleian Library which sought to introduce some of Delisle’s principles, but otherwise that is all we have.  While the Public Record Office in London was at the heart of generating a new literature on the processing and documentation of archives, the Department of Manuscripts at the British Library produced nothing beyond two short handbooks itemizing the various manuscript catalogues, a Guide to Manuscript Indexing by J. P. Hudson, which is a impenetrable description of the typographical house rules used in the indexes of the Catalogue of Additions to the Manuscripts, and a short guide to the methods used initially to automate the catalogues of manuscripts.

As we have seen, the emergence of such standards as AACR2, MARC and now RDA with printed books or ISAD(G) and EAD for archives  were closely related to both theoretical discussions and the development of international associations such as IFLA and the International Congress on Archives. There has been no such process with manuscripts, so that the picture internationally remains fragmented. In America, there was an earlier recognition of the distinct needs of manuscripts and an enthusiasm for a closer connection with mainstream library developments and the promotion of a more integrated approach to manuscripts, such as the proposal of the controversial librarian of Princeton, Ernest Richardson, for the creation of a Union World Catalog of Manuscript Books. This willingness to accept that manuscripts were part of libraries perhaps accounts for the way in which American practice has been more willing to accept that manuscript books can be catalogued in much the same way as printed books. Gregory Pass’s Descriptive Cataloging of Ancient, Medieval, Renaissance, and Early Modern Manuscripts is a supplement to AACR2 which provides guidelines for cataloguing manuscripts according to ACCR2 principles. This approach is widely favoured in the United States, but its drawback is that it cannot cope with the collection hierarchies which are required as soon as one encounters archival materials, and this is one reason why manuscript librarians have been reluctant to go down the simple route of cataloging their manuscripts in AACR. However, while EAD and ISAD(G) preserve information about the collection hierarchies, they are very poor at representing the kind of bibliographical and codicological information. The Liber Horn, for example, is held by the London Metropolitan Archives which naturally uses ISAD(G) and EAD. This is the description for the Liber Horn in the London Metropolitan Archives, and you can see the problems: whether it is helpful to describe the Liber Horn as a file I am not sure, and the kind of structural information we would normally expect in a description of a medieval manuscript is simply not there.  ISAD(G) is geared to large quantities of corporate records, produced by institutions; a volume of uncertain official status produced by a chamberlain of the city is not easily accommodated by a standard designed to cope with the city’s financial records.

There is, then, simply no accepted standard for manuscript cataloguing. This would not matter very much if it wasn’t for automation. The creation of large aggregated catalogues such as OCLC’s WorldCat or the type of federated searching which is possible through services such as CatCymru, which searches the catalogue of every public library in Wales, are only made possible by the standardization grounded in the use of guidelines such as AACR2. Without such standardization, it is impossible to develop such services for manuscripts in the same way.  A brave attempt to initiate such a standard was the MASTER project, which sought to develop a TEI document type definition for use in manuscript cataloguing. An immense amount of work has gone into developing MASTER and it has been used in modified forms in cataloguing collections in Oxford, London, Copenhagen and elsewhere. TEI P5 now includes provision for manuscript description, but use of TEI P5 has tended to be restricted to academic researchers rather than curators, and it has suffered from lack of take up by major libraries. However, the Bodleian Library, which used EAD to prepare a summary catalogue of its manuscript holdings, will be using TEI P5 to provide more detailed descriptions of its medieval manuscripts. Nevertheless, the risks and problems of fragmentation remain, which can be seen by looking at the rather sorry tale of the British Library’s manuscript catalogue.

The British Library’s historic printed manuscript catalogues, such as the long run of Catalogues of Additions to the Manuscripts, were converted to machine readable form in the 1990s and made available online via an Access database, which reproduced the split between description and index in the printed catalogues and offered separate searches for description and index, as well as easy access to information by manuscript number.  The catalogues of some of the oldest collections in the Library were by this time very out of date and a separate project was initiated to identify by means of a shelf survey all the illuminated and pre-1200 manuscripts and then recatalogue them. This resulted in a separate digital catalogue of illuminated manuscripts, where the manuscript descriptions were also made available via an Access database. The manuscript catalogues were separate from the Library’s main catalogue systems, and it was clearly desirable that they should be incorporated in some way. In 1982, the India Office Records were transferred to the British Library. The India Office Records are very much archives and in many ways it would have been preferable to transfer them to the National Archives.  For any library manager, it would clearly make sense to try and provide integrated access to the manuscripts collection and a major archive like the India Office Records. This is where the problem with cataloguing standards kicks in. For the India Office material, ISAD(G) and EAD is the available and recommended standard. For medieval manuscripts, there is no recommended standard, so in creating an integrated British Library archive and manuscript catalogue an ill-advised attempt has ben made to shoehorn the western manuscript catalogue records into ISAD(G) and EAD in  a form that I fear that many manuscript scholars will simply find cumbersome at best and baffling at worst. But it is difficult to suggest an alternative approach if there isn’t a clear-cut manuscript standard available.

It’s perhaps worth lingering a moment to take a closer look at why the new British Library ‘Search our Catalogue Archives and Manuscripts’ is so problematic.  Here’s what happens if you search on Thomas Hoccleve. The first indication that there is a problem is actually in the left-hand side, where entities from the manuscript descriptions, such as the language of the manuscript or names of previous owners are displayed. You will notice that there is some uncertainty as to whether these records are at fonds, item or file level; my suggestion is that they should all be at item level, but the difficulty of thinking about ‘file’ in the case of these manuscripts shows the inappropriateness of the approach. However, more to the point is the display of information about the manuscript. Here is the description of Harley MS 116 in the Catalogue of illuminated Manuscripts, and in my view it is exemplary in the clarity of its distinction between the different aspects of the manuscript. EAD doesn’t allow for any of this, so this is what we get if we go to ‘Details’ for this manuscript in the new catalogue. The first point to notice is that this is a very different description from the one in the Catalogue of Illuminated Manuscripts. Unfortunately, no information is given as to why this new more detailed description was compiled and by who. It is a very fine description but I think you can see how awkwardly it fits into the ISAD(G) template. Moreover, some elements of the information will be difficult to search – there is no reason why we couldn’t easily generate listings of manuscripts pricked in different ways, given the level of detail here, but the inappropriate use of the EAD schema makes that much more difficult.

An even bigger problem is apparent if we look at the description of Sloane MS 1825. In this case, a description of the manuscript compiled in the 1840s has simply been scanned in without further amendment. Physical information is given briefly in Latin, and the date is in the description but hasn’t been registered as the date of creation. Again, there is no indication of the status or origin of the description. All this provides is simply a keyword searchable version of a very old description – useful, since this wasn’t previously accessible, but otherwise not much value. It is very difficult within the new British Library to access descriptions by manuscript number. The manuscript number here is a reference code. This is what you get if you search for the reference code Nero D IV, the reference for the Lindisfarne Gospels. In this record, more care has been taken to try and make the discussion of the physical structure of the manuscript and the bibliography fit into an archival framework, but the way in which the component texts of the manuscript are treated (as if they were papers in a box set) is very disconcerting. Moreover, the listing promises details that we don’t get – its surprising for example that the colophon is listed as a separate textual component, but no details are given anywhere of what the colophon says.

There are many other problems with the new British Library manuscripts catalogue. The facility to add your own subject tags is potentially useful, and a similar facility has been included in the new Discover the National Archives catalogue, but the relevance of reviews for the Lindisfarne Gospels seems doubtful (would we put something like ‘a manuscript that offers a great deal but when you see it close fails to deliver?’). But the important point is that the problem is not the way in which the British Library catalogue has been implemented here, but rather the difficulty caused by the lack of any agreed standard for manuscript cataloguing, which is in itself a symptom of a deeper lack of intellectual consensus as to the most appropriate methods for processing and documenting manuscript collections which are not formal archives. The temptation of course is to leap in and propose what such a standard might look like. The need to develop a more standardized approach is apparent from the outcome of a conference of manuscript librarians from Oxford, New York, the British Library, Harvard, Yale and elsewhere held at the Bodleian Library in 2007, where it was suggested that a good first step might be to look at better handling of name authority. But I’m doubtful whether such tinkering around the edges is adequate. Archival standards are not simply cataloguing conventions but a statement of a whole philosophy as to how archival documents should be processed, stored and made available. Cataloguing standards such as RDA likewise reflect a holistic view of how categories of information are managed.  Likewise, we need to think about what manuscript libraries are and how they should be managed. In thinking about the future of manuscript catalogues, we need to rethink the nature and function of the manuscript catalogue, from first principles.

I think that the linking of data, and thus the tentative first proof of concept that we have been given in Manuscripts Online, has a role here, but we need to start at the beginning and think about what the manuscript collections in the British Library or the Bodleian Library are. The first, and most important point, is one that Otto Mazel stresses in his little handbook, The Keeper of Manuscripts, which is perhaps the nearest thing to a philosophy of manuscript librarianship that we have. While medievalists may naturally assume that the most important things in manuscript collections are the volumes in which they are interested, manuscript holdings are extremely diverse. The Additional Manuscripts in the British Library embrace not only the Luttrell Psalter or Sherborne Missal but also the Codex Sinaiticus, Samuel Taylor Coleridge’s Notebooks, Charles Babbage’s correspondence, the archives of many British Prime Ministers, the notebooks of scientists and engineers like Fleming and Whittle and even a choreographic diagram by Nijinsky. Any processing and cataloguing method to deal with collections like these needs to be embrace all these varied types of material – this is one reason why the use of the TEI guidelines for manuscript description fail to address the problems of manuscript cataloguing. It isn’t satisfactory to contemplate classifying the manuscripts, since individual collections will themselves often be very diverse: Sir Robert Cotton’s library included not only illuminated manuscripts but also a large portion of the personal papers of Thomas Cromwell. Likewise, the manuscripts of the more modern collector Eric Millar included both medieval material and the diaries of the Edwardian writer F. Anstey. If we tried to split this material up into subject types, we would potentially destroy a lot of evidence about the activities of these collectors.

The way in which most manuscript libraries address this problem of diversity is to use acquisition and accessioning as the means of organizing the collections.  This is one of the reasons why the manuscript number is the key that draws together all our thinking about manuscripts. The manuscript number provides our equivalent of title, author and much other bibliographic information for modern printed books, and needs to be at the heart of our thinking about manuscript cataloguing. It is this physicality of manuscripts and other rare materials that creates a distinction with the kind of discovery resources represented by, say, Explore the British Library. It could be argued that coping with such physicality is more critical to the future of the library catalogue than the discovery of wider ranges of resource. Karen Calhoun, in her report to the Library of Congress, argued that since libraries are unlikely to be able to compete with commercial search services, they should perhaps focus on giving greater attention to providing information about rare and unique materials in their collections. However, if libraries are to give greater priority to the catalogues as a means of accessing manuscripts and other special collections, they will need to accept that this requires a different philosophy to that which is evident in the Explore type approach.

Lorcan Dempsey declared that: ’The catalog emerged at a time when information resources were scarce and attention was abundant. Scarce because there were relatively few sources for particular documents or research materials: they were distributed in print, collected in libraries and were locally available. If you wanted to consult books or journal or research reports or maps or government documents you went to the library’. Dempsey points out that nowadays the situation is reversed: ‘information resources are abundant and attention is scarce. The network user has many information resources available to him or her on the network. Research and learning materials may be available through many services, and there is no need for physical proximity’. However, of course, the dynamics described by Dempsey do not apply to manuscripts. In the case of manuscripts, our problem is not so much that we have become less focused and are looking at the manuscript in a more distant fashion, but instead, we are looking at manuscripts under closer and closer microscopes, as we seek to extract every nugget of information that we can from them. The interest of manuscript scholars in the potential of new information technologies is completely the reverse of what Dempsey describes – we want to view the manuscript under finer and finer views and to garner as much information about it as we can.

Again, this means that the focus is on the physical volume, on the individual manuscript, rather than a multiplicity of resources. Linked data is definitely one of the topics of the day in humanities scholarship and elsewhere, but I think there is a tendency to think that if we link a random group of resources together, somehow the magic of linked data will give us instantly new perspectives and new understandings for a particular place or period. I fear that this rather naïve hope is evident in Manuscripts Online resource in its first version, particularly in the selection of resources that have been linked. Sadly, scholarship is much harder than this. Linking of data can be a very useful scholarly technique, but we need to be clear about why we are linking data, what sort of data we are linking, and our aim in doing so. In the case of manuscript catalogues, linking of data has the potential to deal with many of the processing issues which govern the structure of manuscript catalogues, if we approach the linking in the right way.

Dorothy Coveney, one of the few commentators to discuss the philosophy of manuscript cataloguing, said that the primary purpose of a manuscript catalogue is to ensure that the manuscript is securely stored and can be easily located. This security aspect of a catalogue is easily forgotten but in the case of medieval volumes worth millions of pounds remains of fundamental importance. The potential problem of a catalogue which ignores this requirement is illustrated by Samuel Ayscough’s catalogue of the Sloane Manuscripts. Ayscough’s catalogue was organized by author and it meant that the numeration of the manuscripts became rather confused, because Ayscough’s catalogue was not an accurate guide to what should be on the shelf. As a result, the Sloane manuscript containing William Harvey’s lectures on the circulation of the blood was accidentally discarded. When the Harvey volume was found, it was put in the place of a fifteenth-century astrological manuscript, which has now in turn disappeared. The confusion created by Ayscough was only sorted out when a shelflist recording all the numbers of the manuscripts on the shelf was compiled.

In most manuscript libraries, these shelf lists, containing the definitive listing of the manuscript numbers, provides the fundamental statement of what the library holds, and is the spinal column which links everything together. This is an example of the handlist for the some of the Cotton manuscripts in the British Library. This is really the fundamental catalogue for these manuscripts, since it is the only definitive statement of the holdings of this section of the library. Obviously, it would not be much use simply to provide readers with a list of numbers, so initially a listing is prepared which provides an initial view of the manuscript. But the important point is that this is only an initial view – what Edward Maunde Thompson says about a manuscript in the Catalogue of Additions is simply the starting point to a scholarly discussion which will then last centuries. To my mind, ideally a catalogue provides us with access to a complete view of that scholarly discussion in a structured way. Our vision of a catalogue has historically been of a single volume that will provide us with an authoritative statement on a particular manuscript. We expect a Ker or a Kathleen Scott or a Andrew Watson to provide us with an ex cathedra view of what we need to know about a manuscript. This is a view very much driven by the assumption that a catalogue will be a single printed volume. Yet information about manuscripts is scattered through dozens upon dozens of different sources, some in digital form, very many not. Ideally what we want is synoptic access to all those different sources of information. I heard a gripping account recently by Arnold Hunt of the British Library of how linked access to catalogue information can be used to show that a dinosaur tooth in the Natural History Museum came from Sir Hans Sloane’s collection. Not all the information we need to follow this linked chain of evidence is in digital form.

My vision of the future manuscript catalogue then is very much one which is of linked information which enables us to accrue more and more detail about a manuscript. This doesn’t mean of course that we are limited to one single direction in exploring the links, but I see the physical manuscript as remaining our inevitable and necessary starting point. There is an enormous task in assembling the information which would enable us to create such a catalogue, particularly since many of the key sources are not yet available in digital form. Take this example, Additional MS 18196, folio 1, a leaf of a Hymnal , containing part of the hymns to Agnes and Anthony, acquired by the British Museum when Sir Frederic Madden was Keeper of Manuscripts. Among the basic contemporary resources you would need to link from this manuscript number in order to get a good overview of the manuscript are the British Library’s shelflist database, the Catalogue of Additions, Madden’s acquisition reports, Madden’s three series of dairies which contain a great deal of information on manuscripts acquired by him, Madden’s binding records, the huge archives of various annotated sale catalogues held by the British Library, and the indexes of Sir Thomas Phillipps’ manuscripts and catalogues – and that’s all just for starters. Subsequent scholarship on the manuscript is recorded in a huge range of different resources, starting with the Manuscripts Classed Catalogue in the British Library and going right up to works by Paul Binski and Jonathan Alexander. One of the biggest problems faced by manuscript librarians is keeping track of the scholarly bibliography of their subjects. One of the most comprehensive schemes historically was the British Library which systematically collected and indexed offprints of articles relating to manuscripts in the library’s collections, but this pamphlet collection stopped being systematically maintained in the 1960s. We now of course have an excellent opportunity to revive it on a larger scale in the context of something like Manuscripts Online. A search of JSTOR quickly reveals nearly 100 references to the manuscript. The British Library’s own blog reports that this manuscript is indeed currently on loan to the Getty Museum, where one of the curators describes it as the most spectacular Florentine manuscript commission of the first half of the fourteenth century. Just for this single leaf, there is an enormous amount of information to link together.

My vision then of a future manuscript catalogue would be of something that links together a wide range of resources in this way, anchored by the record of the physical manuscript itself. This is why in particularly welcome the vision of Manuscripts Online, which represents a small and tentative step – almost a Fisher Price version – of what I hope the manuscript catalogue might ultimately become.   

Read more »