A Chinese Commentary on Smart Cities


The Beijing City Lab now contains nearly 50 papers on how the technologies that form a science of cities are being applied in China. There is an interesting paper by Wang Jingyuan on smart cities that is worth looking at. It is in Chinese and in preparation for the first lectures on our own Masters Course in Smart Cities that has several Chinese students, you can get the paper by clicking on the links here. There is also an interview with myself and one with the late Professor Sir Peter Hall. All good stuff and a valuable resources

About the Lab: taken from their web site: “The Beijing City Lab (BCL) is a virtual research community, dedicated to studying, but not limited to, China’s capital Beijing. The Lab focuses on employing interdisciplinary methods to quantify urban dynamics, generating new insights for urban planning and governance, and ultimately producing the science of cities required for sustainable urban development. The lab’s current mix of planners, architects, geographers, economists, and policy analysts lends unique research strength.”



ModelTube is the name of the project to integrate agent based modelling into MapTube. Back at the end of last year I did a blog post on how to integrate AgentScript with Google Maps: http://www.geotalisman.org/2013/12/18/bugs-on-a-map/. Following on from this, I used the AgentScript agent based modelling framework, which is based on NetLogo, to visualise the tube network in real time: http://www.geotalisman.org/2014/04/29/another-day-another-tube-strike/. Visualisation of real-time data using agent based modelling techniques makes a lot of sense, especially when the geospatial data is already available on MapTube anyway.

The following image shows a variation of the “AgentOverlay” class added as a new map type in MapTube:


This is the same simple example of agents moving randomly around a map as in the “Bugs on a Map” example, except that this time it’s integrated into MapTube as a “code” layer. The idea of a “programmable map” layer that you can experiment with is an interesting idea. By allowing programmable map layers, users can upload all kinds of interesting geospatial visualisations and are not just limited to choropleths.

It’s going to be a while before this new map type goes live though, as there are still a number of problems to be overcome. The datadrop needs to be modified to handle the code upload, but the biggest problem is with the scripts themselves. The “bugs” example above is written in coffeescript, so I’m having to compile the coffeescript into javascript at the point where it’s being included into the page. I liked the idea of something simpler than Javascript, so I could give the user a command line interface which he could use to create a map with. Sort of like a JSFiddle for maps (MapFiddle?).

MapTube is actually quite clever in how it handles map layers. What it does behind the scenes is to take an XML description of the map layers and transform this into the html page that you see in the browser. This makes it very easy to implement a lot of complex functionality as it’s all just transformational XML to XHTML plus a big chunk of library code. At the moment, though, I’m stuck with how to separate code in different map layers, while still being able to call the layer’s initialisation and get the overlay back. Then there are the cross site scripting problems which are making it difficult to load the tube network and real-time positions. Hopefully, a real-time tube demonstration on the live server isn’t far away.


International Encyclopedia of Geography – Quality Assurance of VGI

The Association of American Geographers is coordinating an effort to create an International Encyclopedia of Geography. Plans started in 2010, with an aim to see the 15 volumes project published in 2015 or 2016. Interestingly, this shows that publishers and scholars are still seeing the value in creating subject-specific encyclopedias. On the other hand, the weird decision by Wikipedians that Geographic Information Science doesn’t exist outside GIS, show that geographers need a place to define their practice by themselves. You can find more information about the AAG International Encyclopedia project in an interview with Doug Richardson from 2012.

As part of this effort, I was asked to write an entry on ‘Volunteered Geographic Information, Quality Assurance‘ as a short piece of about 3000 words. To do this, I have looked around for mechanisms that are used in VGI and in Citizen Science. This are covered in OpenStreetMap studies and similar work in GIScience, and in the area of citizen science, there are reviews such as the one by Andrea Wiggins and colleagues of mechanisms to ensure data quality in citizen science projects, which clearly demonstrated that projects are using multiple methods to ensure data quality.

Below you’ll find an abridged version of the entry (but still long). The citation for this entry will be:

Haklay, M., Forthcoming. Volunteered geographic information, quality assurance. in D. Richardson, N. Castree, M. Goodchild, W. Liu, A. Kobayashi, & R. Marston (Eds.) The International Encyclopedia of Geography: People, the Earth, Environment, and Technology. Hoboken, NJ: Wiley/AAG

In the entry, I have identified 6 types of mechanisms that are used to ensure quality assurance when the data has a geographical component, either VGI or citizen science. If I have missed a type of quality assurance mechanism, please let me know!

Here is the entry:

Volunteered geographic information, quality assurance

Volunteered Geographic Information (VGI) originate outside the realm of professional data collection by scientists, surveyors and geographers. Quality assurance of such information is important for people who want to use it, as they need to identify if it is fit-for-purpose. Goodchild and Li (2012) identified three approaches for VGI quality assurance , ‘crowdsourcing‘ and that rely on the number of people that edited the information, ‘social’ approach that is based on gatekeepers and moderators, and ‘geographic’ approach which uses broader geographic knowledge to verify that the information fit into existing understanding of the natural world. In addition to the approaches that Goodchild and li identified, there are also ‘domain’ approach that relate to the understanding of the knowledge domain of the information, ‘instrumental observation’ that rely on technology, and ‘process oriented’ approach that brings VGI closer to industrialised procedures. First we need to understand the nature of VGI and the source of concern with quality assurance.

While the term volunteered geographic information (VGI) is relatively new (Goodchild 2007), the activities that this term described are not. Another relatively recent term, citizen science (Bonney 1996), which describes the participation of volunteers in collecting, analysing and sharing scientific information, provide the historical context. While the term is relatively new, the collection of accurate information by non-professional participants turn out to be an integral part of scientific activity since the 17th century and likely before (Bonney et al 2013). Therefore, when approaching the question of quality assurance of VGI, it is critical to see it within the wider context of scientific data collection and not to fall to the trap of novelty, and to consider that it is without precedent.

Yet, this integration need to take into account the insights that emerged within geographic information science (GIScience) research over the past decades. Within GIScience, it is the body of research on spatial data quality that provide the framing for VGI quality assurance. Van Oort’s (2006) comprehensive synthesis of various quality standards identifies the following elements of spatial data quality discussions:

  • Lineage – description of the history of the dataset,
  • Positional accuracy – how well the coordinate value of an object in the database relates to the reality on the ground.
  • Attribute accuracy – as objects in a geographical database are represented not only by their geometrical shape but also by additional attributes.
  • Logical consistency – the internal consistency of the dataset,
  • Completeness – how many objects are expected to be found in the database but are missing as well as an assessment of excess data that should not be included.
  • Usage, purpose and constraints – this is a fitness-for-purpose declaration that should help potential users in deciding how the data should be used.
  • Temporal quality – this is a measure of the validity of changes in the database in relation to real-world changes and also the rate of updates.

While some of these quality elements might seem independent of a specific application, in reality they can be only be evaluated within a specific context of use. For example, when carrying out analysis of street-lighting in a specific part of town, the question of completeness become specific about the recording of all street-light objects within the bounds of the area of interest and if the data set includes does not include these features or if it is complete for another part of the settlement is irrelevant for the task at hand. The scrutiny of information quality within a specific application to ensure that it is good enough for the needs is termed ‘fitness for purpose’. As we shall see, fit-for-purpose is a central issue with respect to VGI.

To understand the reason that geographers are concerned with quality assurance of VGI, we need to recall the historical development of geographic information, and especially the historical context of geographic information systems (GIS) and GIScience development since the 1960s. For most of the 20th century, geographic information production became professionalised and institutionalised. The creation, organisation and distribution of geographic information was done by official bodies such as national mapping agencies or national geological bodies who were funded by the state. As a results, the production of geographic information became and industrial scientific process in which the aim is to produce a standardised product – commonly a map. Due to financial, skills and process limitations, products were engineered carefully so they can be used for multiple purposes. Thus, a topographic map can be used for navigation but also for urban planning and for many other purposes. Because the products were standardised, detailed specifications could be drawn, against which the quality elements can be tested and quality assurance procedures could be developed. This was the backdrop to the development of GIS, and to the conceptualisation of spatial data quality.

The practices of centralised, scientific and industrialised geographic information production lend themselves to quality assurance procedures that are deployed through organisational or professional structures, and explains the perceived challenges with VGI. Centralised practices also supported employing people with focus on quality assurance, such as going to the field with a map and testing that it complies with the specification that were used to create it. In contrast, most of the collection of VGI is done outside organisational frameworks. The people who contribute the data are not employees and seemingly cannot be put into training programmes, asked to follow quality assurance procedures, or expected to use standardised equipment that can be calibrated. The lack of coordination and top-down forms of production raise questions about ensuring the quality of the information that emerges from VGI.

To consider quality assurance within VGI require to understand some underlying principles that are common to VGI practices and differentiate it from organised and industrialised geographic information creation. For example, some VGI is collected under conditions of scarcity or abundance in terms of data sources, number of observations or the amount of data that is being used. As noted, the conceptualisation of geographic data collection before the emergence of VGI was one of scarcity where data is expensive and complex to collect. In contrast, many applications of VGI the situation is one of abundance. For example, in applications that are based on micro-volunteering, where the participant invest very little time in a fairly simple task, it is possible to give the same mapping task to several participants and statistically compare their independent outcomes as a way to ensure the quality of the data. Another form of considering abundance as a framework is in the development of software for data collection. While in previous eras, there will be inherently one application that was used for data capture and editing, in VGI there is a need to consider of multiple applications as different designs and workflows can appeal and be suitable for different groups of participants.

Another underlying principle of VGI is that since the people who collect the information are not remunerated or in contractual relationships with the organisation that coordinates data collection, a more complex relationships between the two sides are required, with consideration of incentives, motivations to contribute and the tools that will be used for data collection. Overall, VGI systems need to be understood as socio-technical systems in which the social aspect is as important as the technical part.

In addition, VGI is inherently heterogeneous. In large scale data collection activities such as the census of population, there is a clear attempt to capture all the information about the population over relatively short time and in every part of the country. In contrast, because of its distributed nature, VGI will vary across space and time, with some areas and times receiving more attention than others. An interesting example has been shown in temporal scales, where some citizen science activities exhibit ‘weekend bias’ as these are the days when volunteers are free to collect more information.

Because of the difference in the organisational settings of VGI, a different approaches to quality assurance is required, although as noted, in general such approaches have been used in many citizen science projects. Over the years, several approaches emerged and these include ‘crowdsourcing ‘, ‘social’, ‘geographic’, ‘domain’, ‘instrumental observation’ and ‘process oriented’. We now turn to describe each of these approaches.

Thecrowdsourcing approach is building on the principle of abundance. Since there are is a large number of contributors, quality assurance can emerge from repeated verification by multiple participants. Even in projects where the participants actively collect data in uncoordinated way, such as the OpenStreetMap project, it has been shown that with enough participants actively collecting data in a given area, the quality of the data can be as good as authoritative sources. The limitation of this approach is when local knowledge or verification on the ground (‘ground truth’) is required. In such situations, the ‘crowdsourcing’ approach will work well in central, highly populated or popular sites where there are many visitors and therefore the probability that several of them will be involved in data collection rise. Even so, it is possible to encourage participants to record less popular places through a range of suitable incentives.

Thesocial approach is also building on the principle of abundance in terms of the number of participants, but with a more detailed understanding of their knowledge, skills and experience. In this approach, some participants are asked to monitor and verify the information that was collected by less experienced participants. The social method is well established in citizen science programmes such as bird watching, where some participants who are more experienced in identifying bird species help to verify observations by other participants. To deploy the social approach, there is a need for a structured organisations in which some members are recognised as more experienced, and are given the appropriate tools to check and approve information.

Thegeographic approach uses known geographical knowledge to evaluate the validity of the information that is received by volunteers. For example, by using existing knowledge about the distribution of streams from a river, it is possible to assess if mapping that was contributed by volunteers of a new river is comprehensive or not. A variation of this approach is the use of recorded information, even if it is out-of-date, to verify the information by comparing how much of the information that is already known also appear in a VGI source. Geographic knowledge can be potentially encoded in software algorithms.

Thedomain approach is an extension of the geographic one, and in addition to geographical knowledge uses a specific knowledge that is relevant to the domain in which information is collected. For example, in many citizen science projects that involved collecting biological observations, there will be some body of information about species distribution both spatially and temporally. Therefore, a new observation can be tested against this knowledge, again algorithmically, and help in ensuring that new observations are accurate.

Theinstrumental observation approach remove some of the subjective aspects of data collection by a human that might made an error, and rely instead on the availability of equipment that the person is using. Because of the increased in availability of accurate-enough equipment, such as the various sensors that are integrated in smartphones, many people keep in their pockets mobile computers with ability to collect location, direction, imagery and sound. For example, images files that are captured in smartphones include in the file the GPS coordinates and time-stamp, which for a vast majority of people are beyond their ability to manipulate. Thus, the automatic instrumental recording of information provide evidence for the quality and accuracy of the information.

Finally, the ‘process oriented approach bring VGI closer to traditional industrial processes. Under this approach, the participants go through some training before collecting information, and the process of data collection or analysis is highly structured to ensure that the resulting information is of suitable quality. This can include provision of standardised equipment, online training or instruction sheets and a structured data recording process. For example, volunteers who participate in the US Community Collaborative Rain, Hail & Snow network (CoCoRaHS) receive standardised rain gauge, instructions on how to install it and an online resources to learn about data collection and reporting.

Importantly, these approach are not used in isolation and in any given project it is likely to see a combination of them in operation. Thus, an element of training and guidance to users can appear in a downloadable application that is distributed widely, and therefore the method that will be used in such a project will be a combination of the process oriented with the crowdsourcing approach. Another example is the OpenStreetMap project, which in the general do not follow limited guidance to volunteers in terms of information that they collect or the location in which they collect it. Yet, a subset of the information that is collected in OpenStreetMap database about wheelchair access is done through the highly structured process of the WheelMap application in which the participant is require to select one of four possible settings that indicate accessibility. Another subset of the information that is recorded for humanitarian efforts is following the social model in which the tasks are divided between volunteers using the Humanitarian OpenStreetMap Team (H.O.T) task manager, and the data that is collected is verified by more experienced participants.

The final, and critical point for quality assurance of VGI that was noted above is fitness-for-purpose. In some VGI activities the information has a direct and clear application, in which case it is possible to define specifications for the quality assurance element that were listed above. However, one of the core aspects that was noted above is the heterogeneity of the information that is collected by volunteers. Therefore, before using VGI for a specific application there is a need to check for its fitness for this specific use. While this is true for all geographic information, and even so called ‘authoritative’ data sources can suffer from hidden biases (e.g. luck of update of information in rural areas), the situation with VGI is that variability can change dramatically over short distances – so while the centre of a city will be mapped by many people, a deprived suburb near the centre will not be mapped and updated. There are also limitations that are caused by the instruments in use – for example, the GPS positional accuracy of the smartphones in use. Such aspects should also be taken into account, ensuring that the quality assurance is also fit-for-purpose.

References and Further Readings

Bonney, Rick. 1996. Citizen Science – a lab tradition, Living Bird, Autumn 1996.
Bonney, Rick, Shirk, Jennifer, Phillips, Tina B. 2013. Citizen Science, Encyclopaedia of science education. Berlin: Springer-Verlag.
Goodchild, Michael F. 2007. Citizens as sensors: the world of volunteered geography. GeoJournal, 69(4), 211–221.
Goodchild, Michael F., and Li, Linna. 2012, Assuring the quality of volunteered geographic information. Spatial Statistics, 1 110-120
Haklay, Mordechai. 2010. How Good is volunteered geographical information? a comparative study of OpenStreetMap and ordnance survey datasets. Environment and Planning B: Planning and Design, 37(4), 682–703.
Sui, Daniel, Elwood, Sarah and Goodchild, Michael F. (eds), 2013. Crowdsourcing Geographic Knowledge, Berlin:Springer-Verlag.
Van Oort, Pepjin .A.J. 2006. Spatial data quality: from description to application, PhD Thesis, Wageningen: Wageningen Universiteit, p. 125.

New Paper: ABM Applied to the Spread of Cholera

Cholera transmission through the interaction
of host and the environment
We are pleased to announce we have just had a paper published in Environmental Modelling and Software entitled "An Agent-based Modeling Approach Applied to the Spread of Cholera"

Research highlights include:
  • An agent-based model was developed to explore the spread of cholera.
  • The progress of cholera transmission is represented through a Susceptible-Exposed-Infected-Recovered (SEIR) model. 
  • The model integrates geographical data with agents’ daily activities within a refugee camp.
  • Results show cholera infections are impacted by agents’ movement and source of contamination. 
  • The model has the potential for aiding humanitarian response with respect to disease outbreaks.
Cholera dynamics when rainfall is introduced.

Spatial spread of cholera over the course of a year.

Study area
If the research highlights have not turned you off, the abstract to the paper is below:
"Cholera is an intestinal disease and is characterized by diarrhea and severe dehydration. While cholera has mainly been eliminated in regions that can provide clean water, adequate hygiene and proper sanitation; it remains a constant threat in many parts of Africa and Asia. Within this paper, we develop an agent-based model that explores the spread of cholera in the Dadaab refugee camp in Kenya. Poor sanitation and housing conditions contribute to frequent incidents of cholera outbreaks within this camp. We model the spread of cholera by explicitly representing the interaction between humans and their environment, and the spread of the epidemic using a Susceptible-Exposed-Infected-Recovered model. Results from the model show that the spread of cholera grows radially from contaminated water sources and seasonal rains can cause the emergence of cholera outbreaks. This modeling effort highlights the potential of agent-based modeling to explore the spread of cholera in a humanitarian context."
Finally to aide replication, experimentation or just explore how you can link raster and vector data in GeoMason, we have a dedicated website where you can download executables of the model along with the source code and associated data. Moreover we have provide a really detailed Overview, Design concepts, and Details (ODD) Protocol document of the model.

Full Reference:
Crooks, A.T. and Hailegiorgis, A.B. (2014), An Agent-based Modeling Approach Applied to the Spread of Cholera, Environmental Modelling and Software, 62: 164-177
DOI: 10.1016/j.envsoft.2014.08.027 (pdf)

Twitter research reveals the sweariest locations – Castleford Media (blog)

Castleford Media (blog)

Twitter research reveals the sweariest locations
Castleford Media (blog)
The prevalence of swearing on social media is so fascinating that researchers from University College London's Centre for Advanced Spatial Analysis (CASA) have dedicated a whole study towards it, focusing particularly on the use of Twitter in the ...

Triangulating Social Multimedia Content for Event Localization

As regular visitors will know, we have been developing our ability to collect and analyze social media. To this end we have just received word from Transactions in GIS that our paper entitled "Triangulating Social Multimedia Content for Event Localization using Flickr and Twitter" has just been accepted. Below is the abstract from the paper:
The analysis of social media content for the extraction geospatial information and event-related knowledge has recently received substantial attention. In this paper we present an approach that leverages the complementary nature of social multimedia content by utilizing heterogeneous sources of social media feeds to assess the impact area of a natural disaster. More specifically, we introduce a novel social multimedia triangulation process that uses jointly Twitter and Flickr content in an integrated two-step process: Twitter content is used to identify toponym references associated with a disaster; this information is then used to provide approximate orientation for the associated Flickr imagery, allowing us to delineate the impact area as the overlap of multiple view footprints. In this approach, we practically crowdsource approximate orientations from Twitter content and use this information to orient accordingly Flickr imagery and identify the impact area through viewshed analysis and viewpoint integration. This approach enables us to avoid computationally intensive image analysis tasks associated with traditional image orientation, while allowing us to triangulate numerous images by having them pointed towards the crowdsourced toponym location. The paper presents our approach and demonstrates its performance using a real-world wildfire event as a representative application case study.
 Our cross-source triangulation framework is outlined in the figure below:

The cross-source triangulation framework.
To demonstrate the benefit of using cross-sourced social media in the triangulation process we applied three modes of the analysis:
  • Mode 1: the impact area was estimated as the overlap of all viewsheds that were generated from all Flickr contribution locations without calculating a reference point or evaluating the Angle Of View (AOV) for each image. Accordingly, in this mode, we use only Flickr data, without constraining the viewshed analysis with any AOV information. 
  • Mode 2: the impact area was estimated by using the centroid of the locations of all Flickr contributions as the reference point for the AOV calculation, followed by a viewshed analysis of each image. Accordingly, in this mode we use only Flickr data, ignoring any toponym information from Twitter. 
  • Mode 3: the impact area was estimated by using the toponym reference, as derived from Twitter, as the reference point for the AOV calculation, followed by a viewshed analysis of each image. Accordingly, in this mode we use Twitter content to orient Flickr data and guide the viewshed analysis.
The figure below shows the result from Mode 3:

A three-dimensional perspective of wildfire location assessment as derived by analysis mode 3.

Full Reference: 
Panteras, G., Wise, S., Lu, X., Croitoru, A., Crooks, A.T. and Stefanidis, A. (in press), Triangulating Social Multimedia Content for Event Localization using Flickr and Twitter, Transactions in GIS.

OpenLayers 3


As a learning exercise, I have attempted to “migrate” my #indyref map from OpenLayers 2.13.1 to OpenLayers 3.0.0. It seemed a good time to learn this, because the OpenLayers website now shows v3 as the default version for people to download and use.

I use the term “migrate” in inverted commas, because, really, OpenLayers 3 is pretty much a rewrite, and accordingly requires coding from scratch, the new map. It has so far taken four times as long to do the conversion, as it did to create the original map, although that is a consequence of learning as I code!

Shortcomings in v3 that I have come across so far:

  • No Permalink control. This is unfortunate, particular as “anchor” style permalinks, which update as you move around the map, are very useful for visualisations like DataShine.
  • No opacity control on individual styles – only on layers. This means I can’t have the circles with the fill at 80% opacity but the text at 100% opacity. The opacity functionality is mentioned in the current tutorials on the OpenLayers website and appears to have been removed from the codebase very recently, possibly by mistake.
  • The online documentation, particularly the apidoc, is very sparse in places. As mentioned above, there is also some mismatching in functionality suggested in the online tutorials, to what is actually available. Another example, the use of “font” instead of “fontSize” and “fontStyle” for styles. This will improve I am sure, and there is at least one book available on OpenLayers 3, but it’s still a little frustrating at this stage.
  • Related to the above – no way to specify the font size on a label, without also specifying the font.

Things which I like about the new version:

  • Smooth vector resizing/repositioning when zooming in/out.
  • Attribution is handled better, it looks nicer.
  • No need to have a 100% width/height on the map div any more.
  • Resolution-specific styling.

Some gotchas, which got me for a bit:

  • You need to link in a new ol.css stylesheet, not just the Javascript library, in order to get the default controls to display and position correctly.
  • Attribution information is attached to a source object now, not directly to the layer. Layers contain a source.
  • Attribute-based vector styling is a lot more complicated to specify. You need to create a function which you feed in to an attribute. The function has to return a style wrapped in an array.
  • Hover/mouseover events are not handled directly by OpenLayers any more – but click events are, so the two event types require quite different setups.
Visit the new oobrien.com Shop
High quality lithographic prints of London data, designed by Oliver O'Brien
Electric Tube
London North/South

North Ayrshire’s potty mouthed Twitter users in top 10 – Ardrossan and Saltcoats Herald

North Ayrshire's potty mouthed Twitter users in top 10
Ardrossan and Saltcoats Herald
Researchers from the Centre for Advanced Spatial Analysis (CASA) at University College London monitored all geo-located tweets sent from smartphones in the UK. The findings, taken from 28 August to 4 September, showed Redcar and Cleveland to be the ...

and more »

North Ayrshire’s potty mouthed Twitter users in top 10 – Ardrossan and Saltcoats Herald

North Ayrshire's potty mouthed Twitter users in top 10
Ardrossan and Saltcoats Herald
Researchers from the Centre for Advanced Spatial Analysis (CASA) at University College London monitored all geo-located tweets sent from smartphones in the UK. The findings, taken from 28 August to 4 September, showed Redcar and Cleveland to be the ...

RGS-IBG Annual Conference, 2014: Learning from the 2011 Census


Learning from the 2011 Census: Sessions (1) through (4), Wed 27 August 2014


The following presentations were delivered at the RGS-IBG Annual Conference 2014, sessions ‘Learning from the 2011 Census’. Presentations are listed in session order.


Learning from the 2011 Census (1): Data Delivery and Characteristics

Justin Hayes and Rob Dymond-Green – New and easier ways of working with aggregate data and geographies from UK censuses

Cecilia Macintyre – Scotland’s Census 2011

Oliver Duke-Williams and John Stillwell – Census interaction data and access arrangements

Paul Waruszynski – Microdata products from the 2011 Census

Nicola Shelton, Ian Shuttleworth, Christopher Dibben and Fiona Cox – Longitudinal data in the UK Censuses


Learning from the 2011 Census (2): Changing Populations, Changing Geographies

Nigel Walford – Then and now: Micro-scale population change in parts of London, 1901-11 and 2001-11

Darren Smith – Changing geographies of traditionality and non-traditionality: Findings from the census

Thomas Murphy, John Stillwell and Lisa Buckner – Commuting to work in 2001 and 2011 in England and Wales: Analyses of national trends using aggregate and interaction data from the Census


Learning from the 2011 Census (3): Ethnicity, Health and Migration (part one)

Giles Barrett and David McEvoy – Age and ethnic spatial exposure

Nissa Finney and Ludi Simpson – ‘White flight’? What 2011 census data tell us about local ethnic group population change

Fran Darlington, Paul Norman and Dimitris Ballas – Exploring the inter-relationships between ethnicity, health, socioeconomic factors and internal migration: Evidence from the Samples of Anonymised Records in England

Stephen Clark, Mark Birkin, Phil Rees, Alison Heppenstall and Kirk Harland – Using 2011 Census data to estimate future elderly health care


Learning from the 2011 Census (4): Ethnicity, Health and migration (part two)

Phil Rees and Nik Lomax – Using the 2011 Census to fix ethnic group estimates and components for the prior decade

Nik Lomax, Phil Rees, John Stillwell and Paul Norman – Assessing internal migration patterns in the UK: A once in a decade opportunity

Myles Gould and Ian Shuttleworth – Health, housing tenure, and entrapment 2001-2011: Does changing tenure and address improve health?

Scottish Independence Referendum: Data Map


Scotland’s population is heavily skewed towards the central belt (Glasgow/Edinburgh) which will affect likely reporting times of the independence referendum in the early hours of Friday 19 September, this being dependent both on the overall numbers of votes cast in each of the 32 council areas, and the time taken to get ballot boxes from the far corners of each area to the counting hall in each area. Helicopters will be used, weather permitting, in the Western Isles!

There is also likely a significant variation in the result that each area declares – with regions next to England (so dependent on trade with them) and furthest away from them (so benefiting most from support) likely to strongly vote “No”, the major cities being difficult to call, and the rural areas and smaller, poorer cities of the central vote are much more likely to vote “Yes”. NB. Unlike a constituency election which is “first past the vote” for each area, the referendum is a simple sum-total for everyone, so while it will be interesting hearing each individual results, ultimately we won’t know the result until almost every area has declared the result, and the lead for one side becomes unassailable (areas will declare the size of the vote well before the result, which will make this possible).

A screenshot of a table in a Credit Suisse/PA report was circulating Twitter a couple of days ago, with estimates on all three of these metrics, so I’ve taken this, combined it with centroids of each of the council areas, and produced a map. Like many of my maps these days, coloured circles are the way I’m showing the data. Redder areas are more likely to vote no, and larger circles have a larger registered population. The numbers show the estimated declaration times. Looks like I’ll be up all night on Thursday. Mouse over a circle for more information.

View the live #indyref map here.

Visit the new oobrien.com Shop
High quality lithographic prints of London data, designed by Oliver O'Brien
Electric Tube
London North/South

Fair to middling — that’s my ‘effin verdict on Welbeck and new England – The Sunday Times

Fair to middling — that's my 'effin verdict on Welbeck and new England
The Sunday Times
Rod Liddle AN INSTITUTION called the Centre for Advanced Spatial Analysis recently ran research into swearing on the social media site Twitter. With great pride I can reveal that this august institution found my old manor, Redcar and Cleveland, the ...

Fair to middling — that’s my ‘effin verdict on Welbeck and new England – The Sunday Times

Fair to middling — that's my 'effin verdict on Welbeck and new England
The Sunday Times
AN INSTITUTION called the Centre for Advanced Spatial Analysis recently ran research into swearing on the social media site Twitter. With great pride I can reveal that this august institution found my old manor, Redcar and Cleveland, the sweariest ...

Scottish Referendum conversation as seen on Twitter

There is a lot of talk about Scotland's up and coming referendum for independence. If you are interested in what people are saying live on Twitter you can take a look at our GeoSocial Gauge website.

What you will see is a map showing the location of tweets (which you can click on to find more information), along with options for visualizing the intensity of Twitter activity, and whether or not the tweets are negative or positive. You can also see the overall mood of the conversion and the keywords from the tweets. 

To see more of our GeoSocial projects click here.

Call for Papers: New Directions in Geospatial Simulation

New Directions In Geospatial Simulation

The geospatial simulation community has enjoyed steady growth over the past decade as novel and advanced forms of agent-based and cellular automata modeling continue to facilitate the exploration of complex geographic problems facing the world today. It is now an opportune time to consider the future direction of this community and explore ways to leverage geospatial simulation in professional arenas. The aim of these sessions is to bring together researchers utilizing agent-based and cellular automata techniques and associated methodologies to discuss new directions in geospatial simulation. We invite papers that fall into one of the following four categories:
  • Graduate student geospatial simulation research
  • Methodological advances of agent-based or cellular automata modeling
  • New application frontiers in geospatial simulation
  • Approaches for evaluating the credibility of geospatial simulation models
Student papers will be presented in an interactive short paper session with presentations no longer than five minutes and no more than ten slides. Following presentations, students will form a panel that will address questions from the audience as directed by the session moderator. Student presentations will be judged as a part of a Best Student Paper award, the winner of which will receive an award of $500.

All other papers will be placed in one of the following three sessions: (1) Methodological Advances, (2) Novel Applications, or (3) Model Credibility. Each session will be comprised of four speakers followed by a twenty-minute discussion on the session topic.

Please e-mail the abstract and key words with your expression of intent to Chris Bone by October 28, 2014. Please make sure that your abstract conforms to the AAG guidelines in relation to title, word limit and key words and as specified at http://www.aag.org/cs/annualmeeting/call_for_papers. An abstract should be no more than 250 words that describe the presentation's purpose, methods, and conclusions as well as to include keywords. Full submissions will be given priority over submissions with just a paper title.

Chris Bone, Department of Geography, University of Oregon
Andrew Crooks, Department of Computational Social Science, George Mason University
Alison Heppenstall, School of Geography, University of Leeds
Arika Ligmann-Zielinska, Department of Geography, Michigan State University
David O’Sullivan, Department of Geography, University of California, Berkeley


October 14th, 2014: Second call for papers

October 28th, 2014: Abstract submission and expression of intent to session organizers. E-mail Chris Bone by this date if you are interested in being in this session. Please submit an abstract and key words with your expression of intent. Full submissions will be given priority over submissions with just a paper title.

October 31st, 2014: Session finalization. Session organizers determine session order and content and notify authors.

November 3rd, 2014: Final abstract submission to AAG, via www.aag.org. All participants must register individually via this site. Upon registration you will be given a participant number (PIN). Send the PIN and a copy of your final abstract to Chris Bone. Neither the organizers nor the AAG will edit the abstracts.

November 5th, 2014: AAG registration deadline. Sessions submitted to AAG for approval.

April 21-25, 2014: AAG meeting, Chicago, Illinois, USA.

Tweets more likely to contain swear words on Monday! – Times of India

Nottingham Post

Tweets more likely to contain swear words on Monday!
Times of India
LONDON: Twitter users are more likely to swear in their posts on a Monday evening as they tweet about the pressures of their jobs, a new UK study has found. Researchers from the Centre for Advanced Spatial Analysis (CASA) at University College London ...
Where in the UK do people swear most on Twitter?BBC News
Mansfield scores highly for swearing on TwitterNottingham Post
Redcar: where Twitter is a dirty word (actually - lots of them)The Independent
Telegraph.co.uk -Belfast Telegraph -LocalGov
all 31 news articles »

theEweekly Wrap: Net Neutrality protest, new Apple gear announced, and most … – theEword (blog)

theEweekly Wrap: Net Neutrality protest, new Apple gear announced, and most ...
theEword (blog)
The findings made at the Centre for Advanced Spatial Analysis are based on a week's worth of research between August 28th and 4th September when they analysed over 1.3 million tweets. It found that the most sweary place in the UK was Redcar and ...

From Putney to Poplar: 12 Million Journeys on the London Bikeshare


The above graphic (click for full version) shows 12.4 million bicycle journeys taken on the Barclays Cycle Hire system in London over seven months, from 13 December 2013, when the south-west expansion to Putney and Hammersmith went live, until 19 July 2014 – the latest journey data available from Transport for London’s Open Data portal. It’s an update of a graphic I’ve made for journeys on previous phases of the system in London (& for NYC, Washington DC and Boston) – but this is the first time that data has been made available covering the current full extent of the system – from the most westerly docking station (Ravenscourt Park) to the the most easterly (East India), the shortest route is over 18km.

As before, I’ve used Routino to calculate the “ideal” routes – avoiding the busiest highways and taking cycle paths where they are nearby and add little distance to the journey. Thickness of each segment corresponds to the estimated number of bikeshare bikes passing along that segment. The busiest segment of all this time is on Tavistock Place, a very popular cycle track just south of the Euston Road in Bloomsbury. My calculations estimate that 275,842 of the 12,432,810 journeys, for which there is “good” data, travelled eastwards along this segment.

The road and path network data is from OpenStreetMap and it is a snapshot from this week. These means that Putney Bridge, which is currently closed, shows no cycles crossing it, whereas in fact it was open during the data collection period. There are a few other quirks – the closure of Upper Ground causing a big kink to appear just south of Blackfriars Bridge. The avoidance of busier routes probably doesn’t actually reflect reality – the map shows very little “Boris Bike” traffic along Euston Road or the Highway, whereas I bet there are a few brave souls who do take those routes.

My live map of the docking stations, which like the London Bikeshare itself has been going for over four years, is here.

Persecuted by a number

67“My problem is that I’ve been persecuted by an integer”

With this, George Miller, like a Franz Kafka of cognitive psychology, launched one of the most influential papers in the field. Miller’s persecutor was the number seven – which, in test after test of absolute judgement, appeared as approximately the number of categories people could tell apart.

What a category was, or even how approximately “approximately” was, are where things get interesting. And in trying to answer these questions, Miller drew in the young field of Information Theory to his world. I came across him in visual design – where the “seven things” paradigm is used to argue for simplifying visual representations – for example, reducing a colour scale of continuous data by chunking it into discrete categories.

So let’s start with the experiments he was talking about. They were experiments on absolute judgement – where the subject was required to identify a stimulus as being in a specific category. An example of this would be where an experimenter played a musical note of one of five pitches (for example); the subject would have to decide which of the five pitches it was. Or a note of a certain loudness – again, a discrete number of volume levels would be chosen. Of course, five categories is an arbitrary number; you could choose four or six or seven or one hundred. Now, I’m pretty sure I can identify 8 notes in the octave, but not 100. So, as you might expect, as  you increase the number of categories, people’s judgement gets worse and they make more mistakes. The idea is that you can extrapolate from this and say what the maximum number of categories should be if you want people to perform reliably. (For the experiment on pitch, the answer was about six – which I find a bit weird, given my previous comments, but there’s no reason to think that these notes were organised as a musical scale, which might have given the listener some advantage. Or maybe the blues has it right and E minor pentatonic has all notes you’ll ever need. But I digress.)

Miller talks about a whole series of perceptual tasks that come under a similar umbrella – categorising shapes, sounds, colours, or the positions of dots on a screen – and finds similar rules for the number of categories that people can differentiate: seven, plus or minus two. But while it’s the title of the paper (The magical number seven, plus or minus two), in many ways it’s the least convincing element. More innovative was his use of information theory to characterise this aspect of cognition.

Information Theory was spearheaded by Claude Shannon who famously codified the idea that any message (like a radio signal, or a morse code telegraph) has a certain amount of information that is being imparted. For example, if you’re waiting for a yes/no answer to a question, this can be transmitted digitally by a 0 (no) or a 1 (yes) – the message is said to contain one bit** of information. If I want to tell you which tyre on my car has burst, I need a number between 1 and 4. In binary transmission, this is two bits. You can, in principle, do this with any message – roughly speaking, how much you knew afterwards compared to how much you knew before tells you how much information there is in the message.

Miller was unusual in applying this to these tasks of perception. He argued that while short-term memory was limited to storing seven chunks (seven numbers or seven objects or seven names…), absolute judgment depended on information – typically about 2.5 bits, or seven categories. How did he decide that this was true? Well, you could test short term memory with objects which contain multiple bits of information ‌each. So remembering a sequence of letters like A, F, Z,… is typically no more difficult than remembering a sequence of single-digit numbers like (for example) 7, 7, 1,… even though the letters represent a lot more information –  being drawn from a pool of 26 and the numbers from a pool of 10. Miller’s point was that memory chunks data so we can store more, but perception is limited by the amount of information presented to the recipient.

In using Shannon’s measure of information, Miller took the logarithm of the number of categories – and in doing so, I think, convinced himself of his seven things thesis. The difference between 4 and 16 categories seems pretty big to me, but the difference between 2 and 4 bits of information seems a lot smaller – it’s certainly easier to convince yourself that there’s some magic at work. Secondly, people learn – this capability to categorise data is a lot higher for someone trained in a task than a novice. There also seems to be plenty of evidence that people find it harder to recall a series of longer words, both in terms of information (number of letters, say) and it terms of how long the words take to say – so the “seven chunks in short-term memory” doesn’t quite hold, either.

I started reading Miller’s paper expecting a quick takeaway – people can only tell 5 things apart, they can only remember 7 things – but neither are really true, both vary massively with training, task, and complexity of object. So why is this paper still so influential? Well, for a start, it’s really well written. I recommend giving it a whirl. The description of Shannon Entropy is on of the more accessible around. And, while its results have been superseded, Miller’s concept of chunking, his distinction between bits of information (in the Shannon sense)  and chunks (categories or memories) helped to kickstart a whole strand of research that went on to make his observations obsolete. Which, if we go by our most Popperian ideas of scientific creativity, is the greatest thing a scientist can ever ask for – building something beautiful enough that it gives others the tools to smash it up.


Further reading:

Miller, G.A., 1956. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological review, 63(2), p.81.

Baddeley, A., 1994. The Magical Number Seven: Still Magic After All These Years? Psychological Review April 1994, 101(2), pp.353–356.


*I will come clean – while I’m familiar with information entropy, I have not read this paper. I find it heavy going. Maybe I will revisit it one day.

** a bit as in a computer bit, not as in a little bit

Mansfield scores highly for swearing on Twitter – Nottingham Post

Nottingham Post

Mansfield scores highly for swearing on Twitter
Nottingham Post
The Nottinghamshire town was not placed in the top ten of a study conducted by the BBC, but was still prominent for swearing on the social media network. Scientists at the Centre for Advanced Spatial Analysis (CASA) at University College London ...

and more »

Urban Walkabout Map of of Clerkenwell


This attractive fold-out map (extract above) is produced by Urban Walkabout. It is the fifth in a series of maps of London tourist areas that aim to feature a number of bars, restaurants, independent shops, attractions and other boutique local businesses, all within walking distance of each other. The maps are free and can be found in the receptions of hotels within their areas, and at visitor information centres. For example, the Clerkenwell map which we feature here is available at the reception of the Zetter Hotel (which was the launch venue for the map itself).

We like the cartography on the map – a pleasing, pastel colour palette has been used. Just the right amount of detail is included, to help you navigate the streets in the local area without overburdening with information. Large streets are written in capitals and given a subtle colour highlight, with smaller streets in lower case. Key bus routes (as lines of dots) and tube stations in the area are included. The businesses themselves appear as pastel blobs, for example The Quality Chop House, a Clerkenwell institution near Exmouth Market, appears as a light blue blob (No. 20). Parks are named and pedestrian roads highlighted. The map folds down to A6 so is ideal for a wander. The listing section includes a short description and opening hours. There are also five suggested walking tour descriptions on the map – for example, No. 5 is a pub breakfast at a Smithfield pub (they open early due to the times of the famous and eponymous meat market there.)

The guide shows great attention to detail and is a great example of a business engaging other businesses to create a useful product, one that presents a fresh look at an area compared with standard tourist guide or the ubiquitous Google Map suggestions.

You can download a PDF version of the map at the Urban Walkabout website, however they can mail you a paper copy for free. 75000 paper copies have been produced, and they aim to update the map once every six months or so. The sixth map, launching soon, will be for my favourite London area of all, Angel.


Thanks to Urban Walkabout for the invite to the launch event.

Orkney and Shetland population are the most polite when it comes to Twitter – Herald Scotland

Orkney and Shetland population are the most polite when it comes to Twitter
Herald Scotland
Scientists at the Centre for Advanced Spatial Analysis (CASA) at University College London investigated patterns of Twitter profanity by monitoring tweets sent from a smartphone with geo-location switched on, from the week beginning August 28. The f ...

and more »

Citizen Science in Oxford English Dictionary

At the end of June, I noticed a tweet about new words in Oxford English Dictionary (OED):

I like dictionary definitions, as they help to clarify things, and OED is famous for the careful editing and finding how a term is used before adding it. Being in the OED is significant for Citizen Science, as it is a recognised “proper” term. At the same time, the way that OED define citizen science, and their careful work on finding out when it was first used can help in noticing some aspects. This is how.

Here is the definition, in all its glory:

citizen science n. scientific work undertaken by members of the general public, often in collaboration with or under the direction of professional scientists and scientific institutions.

1989   Technol. Rev. Jan. 12/4   Audubon involves 225 society members from all 50 states in a ‘citizen science’ program… Volunteers collect rain samples, test their acidity levels, and report the results to Audubon headquarters.
2002   M. B. Mulder & P. Coppolillo Conservation xi. 295/1   Citizen science has the potential to strengthen conservation practice in the developing world.
2012   M. Nielsen Reinventing Discov. vii. 151   Citizen science can be a powerful way both to collect and also to analyze enormous data sets.

citizen scientist n.  (a) a scientist whose work is characterized by a sense of responsibility to serve the best interests of the wider community (now rare);  (b) a member of the general public who engages in scientific work, often in collaboration with or under the direction of professional scientists and scientific institutions; an amateur scientist.

1912   Manch. Guardian 11 Sept. 4/2   Trafford, thus serenely established, should..have returned to his researches with a new confidence and content and become a noble citizen-scientist.
1936   Headmaster Speaks 65   Could not Science..turn out a race of citizen scientists who do not make an absolute religion of the acquisition of new scientific knowledge however useless or harmful it may be?
1949   Collier’s 16 July 74/3   By 1930 most citizen-scientists had perfected a technique which brought gin to its peak of flavor and high-octane potency five minutes after the ingredients had been well mixed.
1979   New Scientist 11 Oct. 105/2   The ‘citizen-scientist’, the amateur investigator who in the past contributed substantially to the development of science through part-time dabbling.
2013   G. R. Hubbell Sci. Astrophotogr. xiii. 233   A citizen scientist in the astronomical field has a unique opportunity because astronomy is a wholly observational science.

Dictionaries are more interesting than they might seem. Here are 3 observations on this new definition:

First, the core definition of ‘citizen science’ is interestingly inclusive, so a community based air quality monitoring to volunteers bird surveys and running climate model on your computer at home are all included. This makes the definition useful across projects and types of activities.

Second, the ‘citizen scientist’ is capturing two meanings. The first meaning is noteworthy, as it is the one falls well within Alan Irwin’s way of describing citizen science, or in Jack Stilgoe’s pamphlet that describe citizen scientists. Notice that this meaning is not the common one to describe who is a citizen scientists, but arguably, scientists that are active in citizen science usually become such citizen scientists (sorry for the headache!).

Third, it’s always fun to track down the citations that OED use, as they are trying to find the first use of phrase. So let’s look at the late 20th century citations for ‘citizen science’ and ‘citizen scientist’ (the one from the early 20th century are less representative of current science in my view).

The first use of ‘citizen science’ in the meaning that we’re now using is traced to an article in MIT Technology Review from January 1989. The article ‘Lab for the Environment’ tell the story of community based laboratories to explore environmental hazards, laboratory work by Greenpeace, and Audubon recruitment of volunteers in a ‘citizen science’ programme. The part that describes citizen science is provided below (click here to get to the magazine itself). Therefore, groups such as the Public Laboratory for Open Technology and Science are linked directly to this use of citizen science. 

MIT Technology Review 1989

Just as interesting is the use of ‘citizen scientist’. It was used 10 years earlier, in an article in New Scientist that discussed enthusiasts who are researching Unidentified Flying Objects (UFO) and identified ‘ufology’ as a field of study for these people. While the article is clearly mocking the ufologists as unscientific, it does mention, more or less in passing, the place of citizen-scientists, which is “all but eliminated” by the late 1970s (click here to see the original magazine). This resonate with many of the narrative about how citizen science disappeared in the 20th century and is reappearing now. 



If you would like to use these original references to citizen science and citizen scientists, here the proper reference (I’ll surely look out for an opportunity to do so!)

Kerson, R., 1989, Lab for the Environment, MIT Technology Review, 92(1), 11-12

Oberg, J., 1979, The Failure of the ‘Science’ of Ufology, New Scientist, 84(1176), 102-105



Thanks to Rick Bonney who asked some questions about the definition that led to this post!

Revealed: Twitter users in two London boroughs post messages with some of … – Evening Standard

Evening Standard

Revealed: Twitter users in two London boroughs post messages with some of ...
Evening Standard
The findings were revealed as part of an investigation for BBC Radio 4 by researchers from the Centre for Advanced Spatial Analysis (CASA) at University College London. 1.3million tweets posted over the course of a week were investigated - and just 4.2 ...

and more »

​Phew! Swansea doesn’t figure on list of most sweary places on Twitter – Southwales Evening Post

Southwales Evening Post

​Phew! Swansea doesn't figure on list of most sweary places on Twitter
Southwales Evening Post
Researchers at the Centre for Advanced Spatial Analysis (CASA) at University College London carried out the study for Radio Four programme - Future Proofing. It looked at 1.3 million tweets during the week and found Saturday and Sunday afternoons ...

Geo-Tagged Swearing Check Outs Redcar as Most Profane UK Location – Gizmodo UK

Geo-Tagged Swearing Check Outs Redcar as Most Profane UK Location
Gizmodo UK
Researchers from UCL's Centre for Advanced Spatial Analysis analysed Twitter messages tagged with geographical data by smartphones, finding that nearly eight per cent of all tweets sent from Redcar and its the surrounding area contained at least one ...

The latest outputs from researchers, alumni and friends at UCL CASA