Big Data, Agents and the City

In the recently published book “Big Data for Regional Science” edited by Laurie Schintler and  Zhenhua Chen, Nick Malleson, Sarah Wise, and Alison Heppenstall and myself have a chapter entitled: Big Data, Agents and the City. In the chapter we discuss how big data can be used with respect to building more powerful agent-based models. Specifically how data from say social media could be used to inform agents behaviors and their dynamics; along with helping with the calibration and validation of such models with a emphasis on urban systems. 
Below you can read the abstract of the chapter, see some of the figures we used to support our discussion, along with the full reference and a pdf proof of the chapter. As always any thoughts or comments are welcome.

Abstract:

Big Data (BD) offers researchers the scope to simulate population behavior through vastly more powerful Agent Based Models (ABMs), presenting exciting opportunities in the design and appraisal of policies and plans. Agent-based simulations capture system richness by representing micro-level agent choices and their dynamic interactions. They aid analysis of the processes which drive emergent population level phenomena, their change in the future, and their response to interventions. The potential of ABMs has led to a major increase in applications, yet models are limited in that the individual-level data required for robust, reliable calibration are often only available in aggregate form. New (‘big’) sources of data offer a wealth of information about the behavior (e.g. movements, actions, decisions) of individuals. By building ABMs with BD, it is possible to simulate society across many application areas, providing insight into the behavior, interactions, and wider social processes that drive urban systems. This chapter will discuss, in context of urban simulation, how BD can unlock the potential of ABMs, and how ABMs can leverage real value from BD.  In particular, we will focus on how BD can improve an agent’s abstract behavioral representation and suggest how combining these approaches can both reveal new insights into urban simulation, and also address some of the most pressing issues in agent-based modeling; particularly those of calibration and validation.

Keywords: Agent-based models, Big Data, Emergence, Cities.

The growth in Agent-based modeling -from search results of Web of Science and Google Scholar.
Hotspots of activity of Tweeter Users: Tweet locations and associated densities for a selection of prolific users.

Full Reference:

Crooks, A.T., Malleson, N., Wise, S. and Heppenstall, A. (2018), Big Data, Agents and the City, in Schintler, L.A. and Chen, Z. (eds.), Big Data for Urban and Regional Science, Routledge, New York, NY, pp. 204-213. (pdf)

Continue reading »

Zika in Twitter: Health Narratives

In the paper we explored how health narratives and event storylines pertaining to the recent Zika outbreak emerged in social media and how it related to news stories and actual events.

Specifically we combined actors (e.g. twitter uses), locations (e.g. where the tweets originated) and concepts (e.g. emerging narratives such as pregnancy) to gain insights on the mechanisms that drive participation, contributions, and interactions on social media  during a disease outbreak. Below you can read a summary of our paper along with some of the figures which highlight our methodology and findings.  

An overview of the Twitter narrative analysis approach, starting with data collection, and proceeding with preprocessing and data analysis to identify narrative events, which can be used to build an event storyline.

Abstract:
 

Background: The recent Zika outbreak witnessed the disease evolving from a regional health concern to a global epidemic. During this process, different communities across the globe became involved in Twitter, discussing the disease and key issues associated with it. This paper presents a study of this discussion in Twitter, at the nexus of location, actors, and concepts.

Objective: Our objective in this study was to demonstrate the significance of 3 types of events: location related, actor related, and concept- related for understanding how a public health emergency of international concern plays out in social media, and Twitter in particular. Accordingly, the study contributes to research efforts toward gaining insights on the mechanisms that drive participation, contributions, and interaction in this social media platform during a disease outbreak. 

Methods: We collected 6,249,626 tweets referring to the Zika outbreak over a period of 12 weeks early in the outbreak (December 2015 through March 2016). We analyzed this data corpus in terms of its geographical footprint, the actors participating in the discourse, and emerging concepts associated with the issue. Data were visualized and evaluated with spatiotemporal and network analysis tools to capture the evolution of interest on the topic and to reveal connections between locations, actors, and concepts in the form of interaction networks. 

Results: The spatiotemporal analysis of Twitter contributions reflects the spread of interest in Zika from its original hotspot in South America to North America and then across the globe. The Centers for Disease Control and World Health Organization had a prominent presence in social media discussions. Tweets about pregnancy and abortion increased as more information about this emerging infectious disease was presented to the public and public figures became involved in this. 

Conclusions: The results of this study show the utility of analyzing temporal variations in the analytic triad of locations, actors, and concepts. This contributes to advancing our understanding of social media discourse during a public health emergency of international concern.

Keywords: Zika Virus; Social Media; Twitter Messaging; Geographic Information Systems.

Spatiotemporal participation patterns and identifiable clusters over 4 of our twelve week study. The top left panel shows the data during the first week, and time progresses from left to right and from top to bottom towards .

Subsets of the full retweet network pertaining to the WHO (left) and CDC (right), and clusters identified within them. Magenta clusters are centered upon health entities, green upon news organizations, orange upon political entities.

Visualizing a narrative storyline across locations (blue), actors (red), and concepts (green).

Full Reference:

Stefanidis, A., Vraga, E., Lamprianidis, G., Radzikowski, J., Delamater, P.L., Jacobsen, K.H., Pfoser, D., Croitoru, A. and Crooks, A.T. (2017). “Zika in Twitter: Temporal Variations of Locations, Actors, and Concepts”, JMIR Public Health and Surveillance, 3 (2): e22. (pdf)

As normal, any feedback or comments are most welcome. 

Continue reading »

Zika in Twitter: Health Narratives

In the paper we explored how health narratives and event storylines pertaining to the recent Zika outbreak emerged in social media and how it related to news stories and actual events.

Specifically we combined actors (e.g. twitter uses), locations (e.g. where the tweets originated) and concepts (e.g. emerging narratives such as pregnancy) to gain insights on the mechanisms that drive participation, contributions, and interactions on social media  during a disease outbreak. Below you can read a summary of our paper along with some of the figures which highlight our methodology and findings.  

An overview of the Twitter narrative analysis approach, starting with data collection, and proceeding with preprocessing and data analysis to identify narrative events, which can be used to build an event storyline.

Abstract:
 

Background: The recent Zika outbreak witnessed the disease evolving from a regional health concern to a global epidemic. During this process, different communities across the globe became involved in Twitter, discussing the disease and key issues associated with it. This paper presents a study of this discussion in Twitter, at the nexus of location, actors, and concepts.

Objective: Our objective in this study was to demonstrate the significance of 3 types of events: location related, actor related, and concept- related for understanding how a public health emergency of international concern plays out in social media, and Twitter in particular. Accordingly, the study contributes to research efforts toward gaining insights on the mechanisms that drive participation, contributions, and interaction in this social media platform during a disease outbreak. 

Methods: We collected 6,249,626 tweets referring to the Zika outbreak over a period of 12 weeks early in the outbreak (December 2015 through March 2016). We analyzed this data corpus in terms of its geographical footprint, the actors participating in the discourse, and emerging concepts associated with the issue. Data were visualized and evaluated with spatiotemporal and network analysis tools to capture the evolution of interest on the topic and to reveal connections between locations, actors, and concepts in the form of interaction networks. 

Results: The spatiotemporal analysis of Twitter contributions reflects the spread of interest in Zika from its original hotspot in South America to North America and then across the globe. The Centers for Disease Control and World Health Organization had a prominent presence in social media discussions. Tweets about pregnancy and abortion increased as more information about this emerging infectious disease was presented to the public and public figures became involved in this. 

Conclusions: The results of this study show the utility of analyzing temporal variations in the analytic triad of locations, actors, and concepts. This contributes to advancing our understanding of social media discourse during a public health emergency of international concern.

Keywords: Zika Virus; Social Media; Twitter Messaging; Geographic Information Systems.

Spatiotemporal participation patterns and identifiable clusters over 4 of our twelve week study. The top left panel shows the data during the first week, and time progresses from left to right and from top to bottom towards .

Subsets of the full retweet network pertaining to the WHO (left) and CDC (right), and clusters identified within them. Magenta clusters are centered upon health entities, green upon news organizations, orange upon political entities.

Visualizing a narrative storyline across locations (blue), actors (red), and concepts (green).

Full Reference:

Stefanidis, A., Vraga, E., Lamprianidis, G., Radzikowski, J., Delamater, P.L., Jacobsen, K.H., Pfoser, D., Croitoru, A. and Crooks, A.T. (2017). “Zika in Twitter: Temporal Variations of Locations, Actors, and Concepts”, JMIR Public Health and Surveillance, 3 (2): e22. (pdf)

As normal, any feedback or comments are most welcome. 

Continue reading »

New Paper: User-Generated Big Data and Urban Morphology

Continuing our work with crowdsourcing and geosocial analysis we recently had a paper published in a special issue of the  Built Environment journal entitled “User-Generated Big Data and Urban Morphology.”
The theme of the special issue is: “Big Data and the City” which was guest edited by Mike Batty and includes 12 papers.  To quote from the website

“This cutting edge special issue responds to the latest digital revolution, setting out the state of the art of the new technologies around so-called Big Data, critically examining the hyperbole surrounding smartness and other claims, and relating it to age-old urban challenges. Big data is everywhere, largely generated by automated systems operating in real time that potentially tell us how cities are performing and changing. A product of the smart city, it is providing us with novel data sets that suggest ways in which we might plan better, and design more sustainable environments. The articles in this issue tell us how scientists and planners are using big data to better understand everything from new forms of mobility in transport systems to new uses of social media. Together, they reveal how visualization is fast becoming an integral part of developing a thorough understanding of our cities.”

Table of Contents

In our paper we discuss and show how crowdsourced data is leading to the emergence of alternate views of urban morphology that better capture the intricate nature of urban environments and their dynamics. Specifically how such data can provide us information pertaining to linked spaces and geosocial neighborhoods. We argue that a geosocial neighborhood is not defined by its administrative boundaries, planning zones, or physical barriers, but rather by its emergence as an organic self-organized social construct that is embedded in geographical spaces that are linked by human activity. Below is the abstract of the paper and some of the figures we have in it which showcase our work.

“Traditionally urban morphology has been the study of cities as human habitats through the analysis of their tangible, physical artefacts. Such artefacts are outcomes of complex social and economic forces, and their study is primarily driven by traditional modes of data collection (e.g. based on censuses, physical surveys, and mapping). The emergence of Web 2.0 and through its applications, platforms and mechanisms that foster user-generated contributions to be made, disseminated, and debated in cyberspace, is providing a new lens in the study of urban morphology. In this paper, we showcase ways in which user-generated ‘big data’ can be harvested and analyzed to generate snapshots and impressionistic views of the urban landscape in physical terms. We discuss and support through representative examples the potential of such analysis in revealing how urban spaces are perceived by the general public, establishing links between tangible artefacts and cyber-social elements. These links may be in the form of references to, observations about, or events that enrich and move beyond the traditional physical characteristics of various locations. This leads to the emergence of alternate views of urban morphology that better capture the intricate nature of urban environments and their dynamics.”

Keywords: Urban Morphology, Social Media, GeoSocial, Cities, Big Data.

City Infoscapes – Fusing Data from Physical (L1, L2), Social, Perceptual (L3) Spaces to Derive Place Abstractions (L4) for Different Locations (N1, N2).
Recreational Hotspots Composed of “Locals” and “Tourists” with Perceived Artifacts Indicating “Use” and “Need”. (A) High Line Park (B) Madison Square Garden.



Moving from Spatial Neighborhoods to Geosocial Neighborhoods via Links.

The Emergence of Geosocial Neighborhoods after the in the
Aftermath of the 2013 Boston Marathon Bombing

Full  Reference: 

Crooks, A.T., Croitoru, A., Jenkins, A., Mahabir, R., Agouris, P. and Stefanidis A. (2016). “User-Generated Big Data and Urban Morphology,”  Built Environment, 42 (3): 396-414. (pdf)

Continue reading »

New Paper: User-Generated Big Data and Urban Morphology

Continuing our work with crowdsourcing and geosocial analysis we recently had a paper published in a special issue of the  Built Environment journal entitled “User-Generated Big Data and Urban Morphology.”
The theme of the special issue is: “Big Data and the City” which was guest edited by Mike Batty and includes 12 papers.  To quote from the website

“This cutting edge special issue responds to the latest digital revolution, setting out the state of the art of the new technologies around so-called Big Data, critically examining the hyperbole surrounding smartness and other claims, and relating it to age-old urban challenges. Big data is everywhere, largely generated by automated systems operating in real time that potentially tell us how cities are performing and changing. A product of the smart city, it is providing us with novel data sets that suggest ways in which we might plan better, and design more sustainable environments. The articles in this issue tell us how scientists and planners are using big data to better understand everything from new forms of mobility in transport systems to new uses of social media. Together, they reveal how visualization is fast becoming an integral part of developing a thorough understanding of our cities.”

Table of Contents

In our paper we discuss and show how crowdsourced data is leading to the emergence of alternate views of urban morphology that better capture the intricate nature of urban environments and their dynamics. Specifically how such data can provide us information pertaining to linked spaces and geosocial neighborhoods. We argue that a geosocial neighborhood is not defined by its administrative boundaries, planning zones, or physical barriers, but rather by its emergence as an organic self-organized social construct that is embedded in geographical spaces that are linked by human activity. Below is the abstract of the paper and some of the figures we have in it which showcase our work.

“Traditionally urban morphology has been the study of cities as human habitats through the analysis of their tangible, physical artefacts. Such artefacts are outcomes of complex social and economic forces, and their study is primarily driven by traditional modes of data collection (e.g. based on censuses, physical surveys, and mapping). The emergence of Web 2.0 and through its applications, platforms and mechanisms that foster user-generated contributions to be made, disseminated, and debated in cyberspace, is providing a new lens in the study of urban morphology. In this paper, we showcase ways in which user-generated ‘big data’ can be harvested and analyzed to generate snapshots and impressionistic views of the urban landscape in physical terms. We discuss and support through representative examples the potential of such analysis in revealing how urban spaces are perceived by the general public, establishing links between tangible artefacts and cyber-social elements. These links may be in the form of references to, observations about, or events that enrich and move beyond the traditional physical characteristics of various locations. This leads to the emergence of alternate views of urban morphology that better capture the intricate nature of urban environments and their dynamics.”

Keywords: Urban Morphology, Social Media, GeoSocial, Cities, Big Data.

City Infoscapes – Fusing Data from Physical (L1, L2), Social, Perceptual (L3) Spaces to Derive Place Abstractions (L4) for Different Locations (N1, N2).
Recreational Hotspots Composed of “Locals” and “Tourists” with Perceived Artifacts Indicating “Use” and “Need”. (A) High Line Park (B) Madison Square Garden.



Moving from Spatial Neighborhoods to Geosocial Neighborhoods via Links.

The Emergence of Geosocial Neighborhoods after the in the
Aftermath of the 2013 Boston Marathon Bombing

Full  Reference: 

Crooks, A.T., Croitoru, A., Jenkins, A., Mahabir, R., Agouris, P. and Stefanidis A. (2016). “User-Generated Big Data and Urban Morphology,”  Built Environment, 42 (3): 396-414. (pdf)

Continue reading »

Summer Projects

Over the summer, Arie Croitoru and myself took part in the George Mason University Aspiring Scientists Summer Internship Program. We worked with three very talented high-school students who over the course of the seven and a half week program produced some excellent research around the areas of agent-based modeling and social media analysis. An overview of their work can be seen in the posters and abstracts that the students produced at the end of the internship.
Lawrence Wang explored how social media could be used with respect to predicting election results under a project entitled “And the Winner Is? Predicting Election Results using Social Media”. Below you can read Lawrence’s abstract and see his poster.

“The 2012 U.S. presidential election demonstrated how Twitter can serve as a widely accessible forum of political discourse. Recently, researchers have investigated whether social media, particularly Twitter, can function as a predictive tool. In the past decade, multiple studies have claimed to successfully predict the results of elections using Twitter data. However, many of these studies fail to account for the inherent population bias present in Twitter data, leading to ungeneralizable results. In this project, I investigate the prospects of using Twitter data as an alternative to poll data for predicting the 2012 presidential election. The tweet corpus consisted of tweets published one month before the November election day. Using VADER, a sentiment analysis tool, I analyzed over 140,000 tweets for political sentiment. I attempted to circumvent the Twitter population bias by comparing age, race, and gender metrics of the Twitter population with that of the U.S. population. Furthermore, I utilized Bayesian inference with prior distributions from the results of the 2008 presidential election in order to mitigate the effects of limited tweet data in certain states. The resulting model correctly predicted the likely outcomes of 46 of the 50 states and predicted that President Obama would be reelected with a probability of 0.945. Such a model could be used to explore the forthcoming elections. ” 

In a second project, Varun Talwar, explored how knowledge bases could be utilized to better contextualize social media discussions with a project entitled “Context Graphs: A Knowledge-Driven Model for Contextualizing Twitter Discourse.” Below you can read Varun’s project abstract and his end of project poster.

Introduction: User posted content through online social media (SM) platforms in recent years has emerged as a rich field for narrative analysis of topics captured during the discussion discourse. In particular, collective discourse has been used to manually contextualize public perception of health related events.

Objective: As SM feeds tend to be noisy, automated detection of the context of a given SM discourse stream has proven to be a challenging task. The primary objective of this research is to explore how existing knowledge bases could be utilized to better contextualize SM discussions through topic modeling and mining. By utilizing such existing knowledge it would then be possible to explore to what extent a given discourse is related to a known or a new context, as well as compare and contrast SM discussions through their respective contexts.

Methods: In order to accomplish these goals this research proposes a novel approach for contextualizing SM discourse. In this approach, topic modeling is combined with a knowledgebase in a two-step process. First, key topics are extracted from a SM data corpus by applying a statistical topic-modeling algorithm, a process that also results in data dimensionality reduction. Once a set of salient topics are extracted, each topic is then used to mine the knowledge base for sub graphs that represent the contextual linkages between knowledge elements. Such sub-graphs can then further disambiguate the topic modeling results, and be utilized for qualifying context similarity across SM discussions.

Results: The time-series analysis of the Twitter discourse via graph-matching algorithms reveals the change in topics as evidenced by the emergence of the terms “pregnancy” and “abortion” as information about the virus propagated through the Twitter community. “

Elizabeth Hu explored the current migration crisis in Europe in a project entitled “Across the Sea: A Novel Agent-Based Model for the Migratory Patterns of the European Refugee Crisis”. Below is Elizabeth’s abstract, poster and an example model run.

“Since 2010, a growing number of refugees have sought asylum in European nations, fleeing violence and military conflict in their home countries. Most of the refugees originate from Syria, Iraq, Afghanistan, and African nations. The vast majority of refugees risk their lives in the popular yet perilous Mediterranean Sea Route often prone to boat accidents and subsequent deaths of migrants.  The flow of millions of refugees has introduced a humanitarian crisis not seen since World War II. European nations are struggling to cope with the influx of refugees through various border policies.

In order to explore this crisis, a geographically explicit agent-based model has been developed to study the past and future patterns of refugee flows. Traditional migration models, which represent the population as an aggregate, fail to consider individual decision-making processes based on personal status and intervening opportunities. However, the novel agent-based model developed here of migration allows population behavior to emerge as the result of individual decisions. Initial population, city, and route attributes are based upon data from the UNHCR, EU agencies, crowd-sourced databases, and news articles. The agents, refugees, select goal destinations in accordance with the Law of Intervening Opportunities. Thus, goals are prone to change with fluctuating personal needs. Agents choose routes not only based on distance, but also other relevant route attributes. The resulting migration flows generated by the model under various circumstances could provide crucial guidance for policy and humanitarian aid decisions.”

The movie below gives a sense of the migration paths the refugees are taking.

Continue reading »

Summer Projects

Over the summer, Arie Croitoru and myself took part in the George Mason University Aspiring Scientists Summer Internship Program. We worked with three very talented high-school students who over the course of the seven and a half week program produced some excellent research around the areas of agent-based modeling and social media analysis. An overview of their work can be seen in the posters and abstracts that the students produced at the end of the internship.
Lawrence Wang explored how social media could be used with respect to predicting election results under a project entitled “And the Winner Is? Predicting Election Results using Social Media”. Below you can read Lawrence’s abstract and see his poster.

“The 2012 U.S. presidential election demonstrated how Twitter can serve as a widely accessible forum of political discourse. Recently, researchers have investigated whether social media, particularly Twitter, can function as a predictive tool. In the past decade, multiple studies have claimed to successfully predict the results of elections using Twitter data. However, many of these studies fail to account for the inherent population bias present in Twitter data, leading to ungeneralizable results. In this project, I investigate the prospects of using Twitter data as an alternative to poll data for predicting the 2012 presidential election. The tweet corpus consisted of tweets published one month before the November election day. Using VADER, a sentiment analysis tool, I analyzed over 140,000 tweets for political sentiment. I attempted to circumvent the Twitter population bias by comparing age, race, and gender metrics of the Twitter population with that of the U.S. population. Furthermore, I utilized Bayesian inference with prior distributions from the results of the 2008 presidential election in order to mitigate the effects of limited tweet data in certain states. The resulting model correctly predicted the likely outcomes of 46 of the 50 states and predicted that President Obama would be reelected with a probability of 0.945. Such a model could be used to explore the forthcoming elections. ” 

In a second project, Varun Talwar, explored how knowledge bases could be utilized to better contextualize social media discussions with a project entitled “Context Graphs: A Knowledge-Driven Model for Contextualizing Twitter Discourse.” Below you can read Varun’s project abstract and his end of project poster.

Introduction: User posted content through online social media (SM) platforms in recent years has emerged as a rich field for narrative analysis of topics captured during the discussion discourse. In particular, collective discourse has been used to manually contextualize public perception of health related events.

Objective: As SM feeds tend to be noisy, automated detection of the context of a given SM discourse stream has proven to be a challenging task. The primary objective of this research is to explore how existing knowledge bases could be utilized to better contextualize SM discussions through topic modeling and mining. By utilizing such existing knowledge it would then be possible to explore to what extent a given discourse is related to a known or a new context, as well as compare and contrast SM discussions through their respective contexts.

Methods: In order to accomplish these goals this research proposes a novel approach for contextualizing SM discourse. In this approach, topic modeling is combined with a knowledgebase in a two-step process. First, key topics are extracted from a SM data corpus by applying a statistical topic-modeling algorithm, a process that also results in data dimensionality reduction. Once a set of salient topics are extracted, each topic is then used to mine the knowledge base for sub graphs that represent the contextual linkages between knowledge elements. Such sub-graphs can then further disambiguate the topic modeling results, and be utilized for qualifying context similarity across SM discussions.

Results: The time-series analysis of the Twitter discourse via graph-matching algorithms reveals the change in topics as evidenced by the emergence of the terms “pregnancy” and “abortion” as information about the virus propagated through the Twitter community. “

Elizabeth Hu explored the current migration crisis in Europe in a project entitled “Across the Sea: A Novel Agent-Based Model for the Migratory Patterns of the European Refugee Crisis”. Below is Elizabeth’s abstract, poster and an example model run.

“Since 2010, a growing number of refugees have sought asylum in European nations, fleeing violence and military conflict in their home countries. Most of the refugees originate from Syria, Iraq, Afghanistan, and African nations. The vast majority of refugees risk their lives in the popular yet perilous Mediterranean Sea Route often prone to boat accidents and subsequent deaths of migrants.  The flow of millions of refugees has introduced a humanitarian crisis not seen since World War II. European nations are struggling to cope with the influx of refugees through various border policies.

In order to explore this crisis, a geographically explicit agent-based model has been developed to study the past and future patterns of refugee flows. Traditional migration models, which represent the population as an aggregate, fail to consider individual decision-making processes based on personal status and intervening opportunities. However, the novel agent-based model developed here of migration allows population behavior to emerge as the result of individual decisions. Initial population, city, and route attributes are based upon data from the UNHCR, EU agencies, crowd-sourced databases, and news articles. The agents, refugees, select goal destinations in accordance with the Law of Intervening Opportunities. Thus, goals are prone to change with fluctuating personal needs. Agents choose routes not only based on distance, but also other relevant route attributes. The resulting migration flows generated by the model under various circumstances could provide crucial guidance for policy and humanitarian aid decisions.”

The movie below gives a sense of the migration paths the refugees are taking.

Continue reading »

Megacities through the Lens of Social Media

Megacities, which can be roughly defined as cities with a population of over 10 million people are on the increase due to ongoing urbanization trends. The United Nations notes that since the 1970’s the number of megacities has more than tripled (from 8 to 34), and is expected to further double until 2050 (to exceed 60).

The question we are wondering is how can GeoSocial analysis help understand such cities. To this end, we have recently had a paper published  entitled: “Megacities: Through the Lens of Social Media” in the Journal of the Homeland Defense and Security Information Analysis Center (HDIAC). In the paper we discuss opportunities and challenges that social media brings with respect to understanding the physical and cyber spaces within megacities. Below you can see the synopsis to our paper.

Due to ongoing urbanization trends the worldwide urban population is projected to grow from half of the global population (today) to two thirds of it by 2030. Almost all the new megacities that will emerge through this process are in geopolitical hotspots of southeast Asia and sub-Saharan Africa. Therefore, the U.S. Department of Defense must consider the challenges presented by engagement in such environments when planning for the future. The physical challenge of operating in such dense, highly three-dimensional, environments is only compounded by the added challenge presented by the advanced functional complexity of these environments: megacities function at the intersection of the physical, social, and cyber spaces. Accordingly, military operations in these locations must prepare to engage in environments where news, ideas, and opinions are shaped in cyberspace and propagated across the physical urban landscape. As social networks connect (or, often, divide) populations they form communities and facilitate their mobilization.

We have observed these processes time and again, from the streets of Cairo during the Arab Spring, to the streets of Tokyo during the Fukushima nuclear disaster, and the streets of Paris during the recent ISIL terrorist attacks. Advancing our capability to analyze crowd-generated content in the form of social media feeds is a substantial scientific challenge with considerable implications for future DoD operations. In this publication, we use representative examples to demonstrate the opportunities and challenges associated with such information, especially as they relate to large urban areas. 

An emerging framework to study urban systems.

Social networks embedded within a geographical content, leading to connected, non-contiguous areas.

Full Reference: 

Stefanidis, A., Jenkins A., Croitoru, A. and Crooks, A. (2016). “Megacities Through the Lens of Social Media”, Journal of the Homeland Defense & Security Information Analysis Center (HDIAC), 3(1): 24-29. (pdf)

Continue reading »

Megacities through the Lens of Social Media

Megacities, which can be roughly defined as cities with a population of over 10 million people are on the increase due to ongoing urbanization trends. The United Nations notes that since the 1970’s the number of megacities has more than tripled (from 8 to 34), and is expected to further double until 2050 (to exceed 60).

The question we are wondering is how can GeoSocial analysis help understand such cities. To this end, we have recently had a paper published  entitled: “Megacities: Through the Lens of Social Media” in the Journal of the Homeland Defense and Security Information Analysis Center (HDIAC). In the paper we discuss opportunities and challenges that social media brings with respect to understanding the physical and cyber spaces within megacities. Below you can see the synopsis to our paper.

Due to ongoing urbanization trends the worldwide urban population is projected to grow from half of the global population (today) to two thirds of it by 2030. Almost all the new megacities that will emerge through this process are in geopolitical hotspots of southeast Asia and sub-Saharan Africa. Therefore, the U.S. Department of Defense must consider the challenges presented by engagement in such environments when planning for the future. The physical challenge of operating in such dense, highly three-dimensional, environments is only compounded by the added challenge presented by the advanced functional complexity of these environments: megacities function at the intersection of the physical, social, and cyber spaces. Accordingly, military operations in these locations must prepare to engage in environments where news, ideas, and opinions are shaped in cyberspace and propagated across the physical urban landscape. As social networks connect (or, often, divide) populations they form communities and facilitate their mobilization.

We have observed these processes time and again, from the streets of Cairo during the Arab Spring, to the streets of Tokyo during the Fukushima nuclear disaster, and the streets of Paris during the recent ISIL terrorist attacks. Advancing our capability to analyze crowd-generated content in the form of social media feeds is a substantial scientific challenge with considerable implications for future DoD operations. In this publication, we use representative examples to demonstrate the opportunities and challenges associated with such information, especially as they relate to large urban areas. 

An emerging framework to study urban systems.

Social networks embedded within a geographical content, leading to connected, non-contiguous areas.

Full Reference: 

Stefanidis, A., Jenkins A., Croitoru, A. and Crooks, A. (2016). “Megacities Through the Lens of Social Media”, Journal of the Homeland Defense & Security Information Analysis Center (HDIAC), 3(1): 24-29. (pdf)

Continue reading »

Citizen Science 2015 (second day)

After a very full first day, the second day opened with a breakfast that provided opportunity to meet the board of the Citizen Science Association (CSA), and to have a nice way to talk with people who got up early (starting at 7am) for another full day of citizen science. Around the breakfast tables, new […]

Continue reading »

Crowdsourcing Urban Form and Function

We have just had published a new paper entitled: “Crowdsourcing Urban Form and Function” in International Journal of Geographical Information Science which showcases some of our recent work with respect to cities and how new sources of information can be used to study urban morphology at a variety of spatial and temporal scales. Below is the abstract for the paper: 

“Urban form and function have been studied extensively in urban planning and geographic information science. However, gaining a greater understanding of how they merge to define the urban morphology remains a substantial scientific challenge. Towards this goal, this paper addresses the opportunities presented by the emergence of crowdsourced data to gain novel insights into form and function in urban spaces. We are focusing in particular on information harvested from social media and other open-source and volunteered datasets (e.g. trajectory and OpenStreetMap data). These data provide a first-hand account of form and function from the people who define urban space through their activities. This novel bottom-up approach to study these concepts complements traditional urban studies work to provide a new lens for studying urban activity. By synthesizing recent advancements in the analysis of open-source data we provide a new typology for characterizing the role of crowdsourcing in the study of urban morphology. We illustrate this new perspective by showing how social media, trajectory, and traffic data can be analyzed to capture the evolving nature of a city’s form and function. While these crowd contributions may be explicit or implicit in nature, they are giving rise to an emerging research agenda for monitoring, analyzing and modeling form and function for urban design and analysis.”

This paper builds and extends considerably our prior work, with respect to crowdsourcing, volunteered and ambient geographic information. In the scope of this paper we use the term ‘urban form’ to refer to the aggregate of the physical shape of the city, its buildings, streets, and all other elements that make up the urban space. In essence, the geometry of the city. In contrast, we use the term ‘urban function’ to refer to the activities that are taking place within this space. To this end we contrast how crowdsourced data can related to more traditional sources of such information both explicitly and implicitly as shown in the table below. 

A typology of implicit and explicit form and function content

In addition, we also discuss in the paper how these new sources of data, which are often at finer resolutions than more authoritative data are allowing us to to customize the we we aggregate the data  at various geographical levels as shown below. Such aggregations can range from building footprints and addresses to street blocks (e.g. for density analysis), or street networks (e.g. for accessibility analysis). For large-scale urban analysis we can revert to the use of zonal geographies or grid systems.  
Aggregation methods for varied scales of built environment analysis

In the application section of the paper we highlight how we can extract implicit form and function from crowdsourced data. The image below for example, shows how we can take information from Twitter, and differentiate different neighborhoods over space and time.

Neighborhood map and topic modeling results showing the mixture of social functions in each area.
Finally in the paper, we outline an emerging research agenda related to the “persistent urban morphology concept” as shown below. Specifically how crowdsourcing is changing how we collect, analyze and model urban morphology. Moreover, how this new paradigm provides a new lens for studying the conceptualization of how cities operate, at much finer temporal, spatial, and social scales than we had been able to study so far.

The persistent urban morphology concept.

We hope you enjoy the paper.

Full Reference:  

Crooks, A.T., Pfoser, D., Jenkins, A., Croitoru, A., Stefanidis, A., Smith, D. A., Karagiorgou, S., Efentakis, A. and Lamprianidis, G. (2015), Crowdsourcing Urban Form and Function, International Journal of Geographical Information Science. DOI: 10.1080/13658816.2014.977905 (pdf)

 

Continue reading »

Crowdsourcing Urban Form and Function

We have just had published a new paper entitled: “Crowdsourcing Urban Form and Function” in International Journal of Geographical Information Science which showcases some of our recent work with respect to cities and how new sources of information can be used to study urban morphology at a variety of spatial and temporal scales. Below is the abstract for the paper: 

“Urban form and function have been studied extensively in urban planning and geographic information science. However, gaining a greater understanding of how they merge to define the urban morphology remains a substantial scientific challenge. Towards this goal, this paper addresses the opportunities presented by the emergence of crowdsourced data to gain novel insights into form and function in urban spaces. We are focusing in particular on information harvested from social media and other open-source and volunteered datasets (e.g. trajectory and OpenStreetMap data). These data provide a first-hand account of form and function from the people who define urban space through their activities. This novel bottom-up approach to study these concepts complements traditional urban studies work to provide a new lens for studying urban activity. By synthesizing recent advancements in the analysis of open-source data we provide a new typology for characterizing the role of crowdsourcing in the study of urban morphology. We illustrate this new perspective by showing how social media, trajectory, and traffic data can be analyzed to capture the evolving nature of a city’s form and function. While these crowd contributions may be explicit or implicit in nature, they are giving rise to an emerging research agenda for monitoring, analyzing and modeling form and function for urban design and analysis.”

This paper builds and extends considerably our prior work, with respect to crowdsourcing, volunteered and ambient geographic information. In the scope of this paper we use the term ‘urban form’ to refer to the aggregate of the physical shape of the city, its buildings, streets, and all other elements that make up the urban space. In essence, the geometry of the city. In contrast, we use the term ‘urban function’ to refer to the activities that are taking place within this space. To this end we contrast how crowdsourced data can related to more traditional sources of such information both explicitly and implicitly as shown in the table below. 

A typology of implicit and explicit form and function content

In addition, we also discuss in the paper how these new sources of data, which are often at finer resolutions than more authoritative data are allowing us to to customize the we we aggregate the data  at various geographical levels as shown below. Such aggregations can range from building footprints and addresses to street blocks (e.g. for density analysis), or street networks (e.g. for accessibility analysis). For large-scale urban analysis we can revert to the use of zonal geographies or grid systems.  
Aggregation methods for varied scales of built environment analysis

In the application section of the paper we highlight how we can extract implicit form and function from crowdsourced data. The image below for example, shows how we can take information from Twitter, and differentiate different neighborhoods over space and time.

Neighborhood map and topic modeling results showing the mixture of social functions in each area.
Finally in the paper, we outline an emerging research agenda related to the “persistent urban morphology concept” as shown below. Specifically how crowdsourcing is changing how we collect, analyze and model urban morphology. Moreover, how this new paradigm provides a new lens for studying the conceptualization of how cities operate, at much finer temporal, spatial, and social scales than we had been able to study so far.

The persistent urban morphology concept.

We hope you enjoy the paper.

Full Reference:  

Crooks, A.T., Pfoser, D., Jenkins, A., Croitoru, A., Stefanidis, A., Smith, D. A., Karagiorgou, S., Efentakis, A. and Lamprianidis, G. (2015), Crowdsourcing Urban Form and Function, International Journal of Geographical Information Science. DOI: 10.1080/13658816.2014.977905 (pdf)

 

Continue reading »

Linking Cyber and Physical Spaces

We have just published a new paper in  Computers, Environment and Urban Systems entitled “Linking Cyber and Physical Spaces Through Community Detection And Clustering in Social Media Feeds“. In the paper we explore how geosocial media is providing us with  a new social communication avenue and a novel source of geosocial information. 
In particular, we discuss the notion of physical presence within social media and its importance for exploring the relation between the cyber and the physical domains. We discuss how communities and groups can be detected in both the cyber and physical space, and how they can be processed to form a ‘hybrid’ geosocial view of communities using social network analysis, community detection (the Louvain method) and DenStream. To showcase these concepts and their benefits, we present the analysis of two case studies that make use of Twitter data associated with two different types of events: a planned activity during the Occupy Wall Street (OWS) Day of Action (November 17th, 2011), and the response to the Boston Marathon Bombing (April 15, 2013). We conclude with a summary and outlook. Below is the abstract of the paper:

Over the last decade we have witnessed a significant growth in the use of social media. Interactions within their context lead to the establishment of groups that function at the intersection of the physical and cyber spaces, and as such represent hybrid communities. Gaining a better understanding of how information flows in these hybrid communities is a substantial scientific challenge with significant implications on our ability to better harness crowd-contributed content. This paper addresses this challenge by studying how information propagates and evolves over time at the intersection of the physical and cyber spaces. By analyzing the spatial footprint, social network structure, and content in both physical and cyber spaces we advance our understanding of the information propagation mechanisms in social media. The utility of this approach is demonstrated in two real-world case studies, the first reflecting a planned event (the Occupy Wall Street – OWS – movement’s Day of Action in November 2011), and the second reflecting an unexpected disaster (the Boston Marathon bombing in April 2013). Our findings highlight the intricate nature of the propagation and evolution of information both within and across cyber and physical spaces, as well as the role of hybrid networks in the exchange of information between these spaces.

Research highlights include:

    • Our analysis includes two major events as captured in Twitter.
    • The themes in cyber and physical communities tend to converge over time.
    • Messages among physical space users are more consistent at the onset of the event.
    • Geolocated users are consuming information more than they produce.

      Below are some of the images from the paper. Specifically the first image is how one can think of the relationships between physical and cyber spaces.  The next image provides an overview Our geosocial analysis framework for examining cyber and physical communities.

      Our Geosocial analysis framework

      In the figure below we show an example of using DenStream for spatiotemporal clustering and how the process can capture the protest activities that were planned for the Occupy Wall Street movement’s Day of Action. Each dot corresponds to the originating location of a geolocated tweet; The color of each point indicates the time of the corresponding tweet, ranging from dark blue (early morning, 0) to dark red (late night, 1). While the circles represent a specific spatiotemporal cluster. For example the circle labeled A marked the start of the day where people congregated around Wall Street while circle labeled C shows a cluster at Foley Square.
      Physical space groups identified in the lower Manhattan area. Each dot corresponds to the originating location of a geolocated tweet; The color of each point indicates the time of the corresponding tweet, ranging from dark blue (early morning, 0) to dark red (late night, 1).
      While in the figure below we show one example of linking the cyber and physical communities. Specifically in (a), the top five communities (node degree > 100) in the cyber space retweet network (each community is designated by one color) are shown; (b) shows the physical space groups; and (c) shows the resulting  hybrid meta-network where the connections between physical groups (P nodes), and cyber space communities (C nodes) are shown.

      We hope you enjoy the paper.

      Full Reference:

      Croitoru, A., Wayant, N., Crooks, A.T., Radzikowski, J. and Stefanidis, A. (2014), Linking Cyber and Physical Spaces Through Community Detection And Clustering in Social Media Feeds, Computers, Environment and Urban Systemsdoi:10.1016/j.compenvurbsys.2014.11.002

      Continue reading »

      Linking Cyber and Physical Spaces

      We have just published a new paper in  Computers, Environment and Urban Systems entitled “Linking Cyber and Physical Spaces Through Community Detection And Clustering in Social Media Feeds“. In the paper we explore how geosocial media is providing us with  a new social communication avenue and a novel source of geosocial information. 
      In particular, we discuss the notion of physical presence within social media and its importance for exploring the relation between the cyber and the physical domains. We discuss how communities and groups can be detected in both the cyber and physical space, and how they can be processed to form a ‘hybrid’ geosocial view of communities using social network analysis, community detection (the Louvain method) and DenStream. To showcase these concepts and their benefits, we present the analysis of two case studies that make use of Twitter data associated with two different types of events: a planned activity during the Occupy Wall Street (OWS) Day of Action (November 17th, 2011), and the response to the Boston Marathon Bombing (April 15, 2013). We conclude with a summary and outlook. Below is the abstract of the paper:

      Over the last decade we have witnessed a significant growth in the use of social media. Interactions within their context lead to the establishment of groups that function at the intersection of the physical and cyber spaces, and as such represent hybrid communities. Gaining a better understanding of how information flows in these hybrid communities is a substantial scientific challenge with significant implications on our ability to better harness crowd-contributed content. This paper addresses this challenge by studying how information propagates and evolves over time at the intersection of the physical and cyber spaces. By analyzing the spatial footprint, social network structure, and content in both physical and cyber spaces we advance our understanding of the information propagation mechanisms in social media. The utility of this approach is demonstrated in two real-world case studies, the first reflecting a planned event (the Occupy Wall Street – OWS – movement’s Day of Action in November 2011), and the second reflecting an unexpected disaster (the Boston Marathon bombing in April 2013). Our findings highlight the intricate nature of the propagation and evolution of information both within and across cyber and physical spaces, as well as the role of hybrid networks in the exchange of information between these spaces.

      Research highlights include:

        • Our analysis includes two major events as captured in Twitter.
        • The themes in cyber and physical communities tend to converge over time.
        • Messages among physical space users are more consistent at the onset of the event.
        • Geolocated users are consuming information more than they produce.

          Below are some of the images from the paper. Specifically the first image is how one can think of the relationships between physical and cyber spaces.  The next image provides an overview Our geosocial analysis framework for examining cyber and physical communities.

          Our Geosocial analysis framework

          In the figure below we show an example of using DenStream for spatiotemporal clustering and how the process can capture the protest activities that were planned for the Occupy Wall Street movement’s Day of Action. Each dot corresponds to the originating location of a geolocated tweet; The color of each point indicates the time of the corresponding tweet, ranging from dark blue (early morning, 0) to dark red (late night, 1). While the circles represent a specific spatiotemporal cluster. For example the circle labeled A marked the start of the day where people congregated around Wall Street while circle labeled C shows a cluster at Foley Square.
          Physical space groups identified in the lower Manhattan area. Each dot corresponds to the originating location of a geolocated tweet; The color of each point indicates the time of the corresponding tweet, ranging from dark blue (early morning, 0) to dark red (late night, 1).
          While in the figure below we show one example of linking the cyber and physical communities. Specifically in (a), the top five communities (node degree > 100) in the cyber space retweet network (each community is designated by one color) are shown; (b) shows the physical space groups; and (c) shows the resulting  hybrid meta-network where the connections between physical groups (P nodes), and cyber space communities (C nodes) are shown.

          We hope you enjoy the paper.

          Full Reference:

          Croitoru, A., Wayant, N., Crooks, A.T., Radzikowski, J. and Stefanidis, A. (2014), Linking Cyber and Physical Spaces Through Community Detection And Clustering in Social Media Feeds, Computers, Environment and Urban Systemsdoi:10.1016/j.compenvurbsys.2014.11.002

          Continue reading »

          IR: State-Driven and Citizen-Driven Networks

          Our work exploring how social media can be used to study events around the world has resulted in a new publication in the  Social Science Computer Review entitled “International Relations: State-Driven and Citizen-Driven Networks.” In essence what we are attempting to do is compare traditional international relations (e.g. from the United Nations General Assembly voting patterns) to those arising from the bottom up interactions (i.e from people on the ground). The abstract of the paper is below along with some of the images that accompany the paper.

          The international community can be viewed as a set of networks, manifested through various transnational activities. The availability of longitudinal datasets such as international arms trades and United Nations General Assembly (UNGA) allows for the study of state-driven interactions over time. In parallel to this top-down approach, the recent emergence of social media is fostering a bottom-up and citizen driven avenue for international relations (IR). The comparison of these two network types offers a new lens to study the alignment between states and their people. This paper presents a network-driven approach to analyze communities as they are established through different forms of bottom-up (e.g. Twitter) and top-down (e.g. UNGA voting records and international arms trade records) IR. By constructing and comparing different network communities we were able to evaluate the similarities between state-driven and citizen-driven networks. In order to validate our approach we identified communities in UNGA voting records during and after the Cold War. Our approach showed that the similarity between UNGA communities during and after the Cold War was 0.55 and 0.81 respectively (in a 0-1 scale). To explore the state- versus citizen-driven interactions we focused on the recent events within Syria within Twitter over a sample period of one month. The analysis of these data show a clear misalignment (0.25) between citizen-formed international networks and the ones established by the Syrian government (e.g. through its UNGA voting patterns).

          Full reference:

          Crooks, A.T., Masad, D., Croitoru, A., Cotnoir, A., Stefanidis, A. and Radzikowski, J. (2013), International Relations: State-Driven and Citizen-Driven Networks, Social Science Computer Review. DOI:10.1177/0894439313506851

          If you don’t have access to Social Science Computer Review, send us an email and we can send you an early version of the paper. This is also only part of our work on using multiple networks to explore international relations. One can of course also explore the networks in more detail. For example in the figure below we plot the actual transfer of arms between states during the 2001 and 2011 period. One can clearly see how different states are connected with Syria however, Russia has connections to many states.

          Arms transfers
          Or if we explore Twitter hastags and add an edge between any pair of hashtags when they are used in the same tweet we can explore an emergent ontology of topic labels users associate with each other. For example, the #Allepo hashtag is associated with other hashtags which appear to local events, including “#civilian”, “#airstrike”, “#hunger”, “#pictures”, many of which are only connected to the #Aleppo hashtag as shown below.

          Continue reading »

          IR: State-Driven and Citizen-Driven Networks

          Our work exploring how social media can be used to study events around the world has resulted in a new publication in the  Social Science Computer Review entitled “International Relations: State-Driven and Citizen-Driven Networks.” In essence what we are attempting to do is compare traditional international relations (e.g. from the United Nations General Assembly voting patterns) to those arising from the bottom up interactions (i.e from people on the ground). The abstract of the paper is below along with some of the images that accompany the paper.

          The international community can be viewed as a set of networks, manifested through various transnational activities. The availability of longitudinal datasets such as international arms trades and United Nations General Assembly (UNGA) allows for the study of state-driven interactions over time. In parallel to this top-down approach, the recent emergence of social media is fostering a bottom-up and citizen driven avenue for international relations (IR). The comparison of these two network types offers a new lens to study the alignment between states and their people. This paper presents a network-driven approach to analyze communities as they are established through different forms of bottom-up (e.g. Twitter) and top-down (e.g. UNGA voting records and international arms trade records) IR. By constructing and comparing different network communities we were able to evaluate the similarities between state-driven and citizen-driven networks. In order to validate our approach we identified communities in UNGA voting records during and after the Cold War. Our approach showed that the similarity between UNGA communities during and after the Cold War was 0.55 and 0.81 respectively (in a 0-1 scale). To explore the state- versus citizen-driven interactions we focused on the recent events within Syria within Twitter over a sample period of one month. The analysis of these data show a clear misalignment (0.25) between citizen-formed international networks and the ones established by the Syrian government (e.g. through its UNGA voting patterns).

          Full reference:

          Crooks, A.T., Masad, D., Croitoru, A., Cotnoir, A., Stefanidis, A. and Radzikowski, J. (2013), International Relations: State-Driven and Citizen-Driven Networks, Social Science Computer Review. DOI:10.1177/0894439313506851

          If you don’t have access to Social Science Computer Review, send us an email and we can send you an early version of the paper. This is also only part of our work on using multiple networks to explore international relations. One can of course also explore the networks in more detail. For example in the figure below we plot the actual transfer of arms between states during the 2001 and 2011 period. One can clearly see how different states are connected with Syria however, Russia has connections to many states.

          Arms transfers
          Or if we explore Twitter hastags and add an edge between any pair of hashtags when they are used in the same tweet we can explore an emergent ontology of topic labels users associate with each other. For example, the #Allepo hashtag is associated with other hashtags which appear to local events, including “#civilian”, “#airstrike”, “#hunger”, “#pictures”, many of which are only connected to the #Aleppo hashtag as shown below.

          Continue reading »
          1 2 3