Editorial: Visualisation Tools for Understanding Big Data

I recently co-wrote an editorial (download the full version here) with Mike Batty (UCL CASA) in which we explored some of the current issues surrounding the visualisation of large urban datasets. We were inspired to write it following the CASA Smart Cities conference and we included a couple of visualisations I have blogged here. Much of the day was devoted to demonstrating the potential of data visualisation to help us better understand our cities. Such visualisations would not have been possible a few years ago using desktop computers their production has ballooned as a result of recent technological (and in the case of OpenData, political) advances.

In the editorial we argue that the many new visualisations, such as the map of London bus trips above, share much in common with the work of early geographers and explorers whose interests were in the description of often-unknown processes. In this context, the unknown has been the ability to produce a large-scale impression of the dynamics of London’s bus network. The pace of exploration is largely determined by technological advancement and handling big data is no different. However, unlike early geographic research, mere description is no longer a sufficient benchmark to constitute advanced scientific enquiry into the complexities of urban life. This point, perhaps, marks a distinguishing feature between the science of cities and the thousands of rapidly produced big data visualisations and infographics designed for online consumption. We are now in a position to deploy the analytical methods developed since geography’s quantitative revolution, which began half a century ago, to large datasets to garner insights into the process. Yet, many of these methods are yet to be harnessed for the latest datasets due to the rapidity and frequency of data releases and the technological limitations that remain in place (especially in the context of network visualisation). That said, the path from description to analysis is clearly marked and, within this framework, visualisation plays an important role in the conceptualisation of the system(s) of interest, thus offering a route into more sophisticated kinds of analysis.

The trips visualised on London’s network provide the basis on which to perceive the extent of congestion on the road system at the system’s key junctions. When this information is combined with traffic flow data, it provides a real-time basis for exploring how patterns of congestion and routing change and evolve during the working day and over longer time periods. In one sense, this kind of data has been available at crude snapshots in time and at a coarser spatial scales for many years but the fact that we are now able to collect it routinely, almost in real time in some instances and begin to visualise it on the same time cycles, provides us with extremely powerful tools to examine problems that previously have been beyond our ability to even articulate, never mind explore. Currently, we are adding the smart card data on trips made across all types of public transport in Greater London to the timetables data and providing a picture of flows in terms of both vehicles and person movements. In this form the data can be animated to provide the first working models or rather representations of how these flows evolve over many different time scales. As each trip is available with a unique identifier, space–time profiles can be assembled for many millions of travellers and their behaviour visualised. With some seven million passengers (trips) in the system on a typical day, we can generate countless aggregations for the dataset we are currently working with which has data over a six- month period.

Jon Reades has produced an excellent visualisation (above) that charts for seven days starting at 4am on a typical Sunday and evolving the flows in ten-minute chunks over the week. With such visualisations, patterns on many spatial and temporal scales can be inferred—clearly the usual peaks during the working week, but entertainment events and such like, as well as the influence of school holidays. From such data it is even possible to examine the behaviour of those with free passes—the elderly and the young—in contrast to more typical travellers of working age.

Finally, there needs to be much more sophisticated visualisation of these kinds of results with respect to error and uncertainty in the data (either due to data quality, as noted above, or model assumptions). Uncertainty is often an important oversight in many of the headline visualisations associated with big data and therefore this offers a further area of contribution from the research community. The ubiquity of data visualisations of social phenomena should be embraced and their popularity harnessed to increase the impact of our work. We do, however, need to see description as the starting point rather than the end point of researching big data and work towards analytical insights through the application of well-trusted methods developed during the ‘small data’ years.

Mike and I conclude by saying we now stand at a threshold which has major implications for our science and for the way we plan. Many of our tools in planning and design are constructed to examine problems of cities of a much less immediate nature than the kinds of data that are now literally pouring out from instrumented systems in the city. A sea change in our focus is taking place and, over the last ten years, formal tools to examine much finer spatial scales have been evolving, particularly those dealing with local movement such as pedestrian modelling. But now the focus has changed again for big data is not spatial data per se but like big science, its data relates to temporal sequences. No longer is the snapshot in time the norm. Data that pertain to real time, geocoded to the finest space–time resolution are becoming the new norm and our tools and models need to adapt. Moreover in time, our quest to look at the long-term evolution of cities will be reinvigorated by data from the short term as we begin to look at data not over the minute or the hour but over longer temporal cycles eventually joining up the traditional gold-standard censuses such as those that take place every decade. In fact, it is likely that these longer term snapshots will themselves change as digital data from the short term comes to complement and restructure how we look at data in the long term. But that is another story, for a later editorial, but one that is equally important in our quest to provide an understanding of big data.

For the full editorial click here, and for more on data visualisations and cities Fran Castillo has drawn together many of these ideas in an excellent post at complexitys.com.