Big Data: Some Historical Perspectives

This was a contribution to a plenary panel at the European Policy on Intellectual Property conference organised by CREATe at the University of Glasgow in September 2015. In the nature of a short contribution to a panel on a wider theme, it barely scratches the surface of the possibilities implied i the title, but here it is for the record.

It is already very evident from our discussions here that a distinctive feature of research into intellectual property is the emphasis on historical understanding. Petra Moser’s keynote yesterday was a wonderful illustration of how intellectual property researchers find historical data which has a wider cultural significance and is more than simply a lab for exploring different models of access. It may seem that, since big data is meant to present issues of scale and potential that we haven’t encountered before, historical perspectives won’t be particularly helpful. What I want to suggest here that we perhaps need to widen our historical terms of reference, and not restrict ourselves to precise historical precedents and analogies.

Sometimes, big data requires big history (although we should also be aware of the caveat of Tim Hitchcock about the dangers of thinking exclusively on a large scale — we need both microscope and macroscope).

Let’s go back a long way, to 1086, when William the Conqueror, who had won the English crown at the Battle of Hastings twenty years previously, gave orders that a detailed survey should be undertaken of his English dominions. The Anglo-Saxon chronicle described how William sent his men to every county to enquire who held what land. The chronicler was horrified by the amount of information William collected: ‘So very thoroughly did he have the inquiry carried out that there was not a single hide, not one virgate of land, not even — it is shameful to record it, but it did not seem shameful for him to do — not even one ox, nor one cow, nor one pig which escaped notice in his survey’. Collating all this information required William’s clerks to develop innovative data processing techniques, as they prepared a series of summaries of the data, eventually reducing it to two stout volumes. The motives of William in collecting this information are still debated by historians, but the data was immediately put to use in royal courts and tax collection. Within a short period of time, this eleventh-century experiment in big data had become known as Domesday Book — the book of the day of judgement, from which there is no appeal — just as there can today be no appeal from the algorithms that might be used to set our insurance policy or credit rating.

Domesday Book was the first English public record and is a forcible reminder that anxieties about government data collection are nothing new. 2015 marks the anniversary of the grant of another celebrated English public document, Magna Carta. King John is remembered as the tyrant forced to grant Magna Carta at Runnymede, but his reign was also important because many of the major series of records recording government business began in his reign. John’s reign saw an upsurge in the use of technologies of writing and mathematics in the business of government. One important thread in the Magna Carta story is that it was both a reaction to, and at the same time an expression of, this growth in new technologies of government. It’s intriguing that Tim Berners-Lee and others have called for a Magna Carta for the world wide web to address issues of privacy and openness. There are a number of problems with this. One is, of course, that Magna Carta is linked to a common law system — it hasn’t even been adopted by the whole of Britain, as Scotland with its roman law system has always had a semi-detached relationship with Magna Carta. The other is that granting that the embedding of Magna Carta in English political life was a complex process, spread over several centuries and involving two civil wars.

In considering the issues of governance, ethics and identity posed by big data, this kind of longue durée approach can be very helpful. Jon Agar’s wonderful book, The Government Machine: A Revolutionary History of the Computer describes how the conception of the modern computer was influenced by the type of administrative processes developed by government bureaucracies in the nineteenth century which sought to distinguish between high level analytical policy work and routine mechanical clerical labour. Charles Babbage’s work was a sophisticated expressions of this nineteenth-century urge to identify and mechanise the routine. Closely linked to this urge to mechanise government was a concern, in the wake of the industrial revolution and the growth of population, to gather as much statistical information as possible about the enormous changes taking place. In a way, data can be seen as an expression of modernity. Another key big data moment was the 1890 United States census when the huge quantity of data necessitated the use of automatically sorted punch cards to analyse the information. Jon Agar vividly describes the achievements of this analogue computing and the rise of IBM. His account of the debates surrounding the national registration schemes introduced in wartime and the anxieties about linking these to for example employment or health records illustrate how our current concerns have long antecedents.

However, I think looking at big data concerns in this way does more than simply remind us that there is nothing new under the sun. It is also helpful in clarifying what is distinctive about recent developments and in identifying areas which should be policy priorities. First is the ubiquity of data.

For governments from the eleventh to the twentieth century, data was something gathered with enormous clerical and administrative effort which had to be carefully curated and safeguarded. Data like that recorded in Domesday Book or records of land grants was one of the primary assets of pre-modern governments. Only large organisations such as governments or railroad companies had the resources to process this precious data — indeed one of the changes that is very evident is the shift in processing power, and perhaps we should be talking more about big processing rather than big data. Data was used in order to govern and was integral to the political compact. Now data is ubiquitous and comparatively cheap to acquire and process, this framework of trust no longer applies. Moreover, the types of organisations deploying data have changed. In particular, it is noticeable that the driving forces behind the development of big data methods have frequently been commercial and retail organisations: not only Google and Amazon, but also large insurance, financial and healthcare corporations. This is a contrast to earlier developments, both analogue and digital, where governments have been prominent and private sector involvement more limited.

The Oxford English Dictionary draws a distinction between the term big data as applied to the size of datasets and big data referring to particular computational methods, most notably predictive analytics. Predictive analytics poses very powerful social and cultural challenges, especially as more and more personal data such as whole genome sequences becomes cheaper and more widely available. How far can your body be covered by existing concepts of privacy? And is the likely future path of your health, career and life a matter of purely personal concern? In many ways, it is this idea of prediction which most forcibly challenges many of our most cherished social and cultural assumptions. Predicitive policing — an early contact by the police with people considered likely to commit crimes — is already being tested in some American cities. Predictivity almost dissolves privacy because it shifts the way in which we look at freedom of choice. It starts to become irrelevant as to what my reading or music choices are if they can be readily predicted from publicly available data. How we cope with a society in which many of our actions can be predicted is one of the chief challenges posed by big data. As my colleague Barry Smith, from the AHRC’s Science in Culture theme, has emphasised, the neuroscience surrounding predictivity — the way in which the brain copes with this predictivity — will become a fundamental area of research. As predictive analytics shades into machine learning, these questions will become even more complex, since we will start to see the distinction described by Agar between analytic work and routine labour breaking down in large organisations, posing major social and cultural challenges.

Finally, it is worth noting that generally the most important large data sets (censuses, tax records) have been about people, but increasingly big data will become about things. For example, machine tools frequently have sensors attached to them which enable the state of the tools to be monitored remotely by the manufacturer. This might encourage the manufacturer to monitor use of their products by clients in ways that could have commercial implications. The monitoring of medical implants will raise even more complex issues. A hint of the kind of complications that these developments might raise was given in the concurrence of Justice Alitto in the US Supreme court judgement in US v Jones 2012, which concerned the use of GPS tracking devices by police. Struggling to imagine how the framers of the US constitution would have viewed such devices, and imagined the analogy of ‘a case in which a constable secreted himself somewhere in a coach and remained there for a period of time in order to monitor the movements of the coach’s owner’.

For the Anglo-Saxon chronicle complaining about Domesday Book, the objects of the king’s greed were evident: land and animals; our future anxieties may be very different because our chief anxiety may be about objects linked to us in much more distant and complex ways.

About Me

Popular Posts

6 September 2015

Big Data: Some Historical Perspectives

0 comments:

Post a Comment

Search these riffs

Blog Archive

Labels