New Paper: Cancer and Social Media

Continuing our work on geosocial analysis we recently had a paper entitled “Cancer and Social Media: A Comparison of Traffic about Breast Cancer, Prostate Cancer, and Other Reproductive Cancers on Twitter and Instagram” published in the Journal of Health Communication. In the paper we  present a comparative study of differences in messaging for women’s and men’s cancer campaigns on social media through three discrete approaches. 
  1. we directly compare the incident rates of women’s and men’s cancers in the United States to the corresponding levels of traffic that these cancers elicited during World Cancer Day across two social media platforms, Twitter and Instagram. 
  2. we examine social media activity for breast cancer versus prostate cancer on both Twitter and Instagram during the dedicated month-long campaigns (October and November, respectively). 
  3. we compare the top terms associated with each campaign on these two social media platforms to discover whether there are differences in the terms associated with these online discussions.
Below you can read the abstract to our paper, see some of our results and at the bottom of the post have the full citation and link to the paper.

Abstract: 

Social media are often heralded as offering cancer campaigns new opportunities to reach the public. However, these campaigns may not be equally successful, depending on the nature of the campaign itself, the type of cancer being addressed, and the social media platform being examined. This study is the first to compare social media activity on Twitter and Instagram across three time periods: #WorldCancerDay in February, the annual month-long campaigns of National Breast Cancer Awareness Month (NBCAM) in October and Movember in November, and during the full year outside of these campaigns. Our results suggest that women’s reproductive cancers – especially breast cancer – tend to outperform men’s reproductive cancer – especially prostate cancer – across campaigns and social media platforms. Twitter overall generates substantially more activity than Instagram for both cancer campaigns, suggesting Instagram may be an untapped resource. However, the messaging for both campaigns tends to focus on awareness and support rather than on concrete actions and behaviors. We suggest health communication efforts need to focus on effective messaging and building engaged communities for cancer communication across social media platforms.

A comparison of percentages of cancer cases (green bars) and references to corresponding cancers in Twitter (blue bar) and Instagram (orange bar) during World Cancer Day 2016.

 References to breast cancer (green line), prostate cancer (orange line), and Movember (blue line) over the full year 2015 in Instagram.

Full Reference: 

Vraga, E., Stefanidis, A., Lamprianidis, G., Croitoru, A., Crooks, A.T. Delamater, P.L., Pfoser, D., Radzikowski, J. and Jacobsen, K.H. (2018), Cancer and Social Media: A Comparison of Traffic about Breast Cancer, Prostate Cancer, and Other Reproductive Cancers on Twitter and Instagram, Journal of Health Communication. 3(2), 181-189. (pdf)

Continue reading »

New Paper: Cancer and Social Media

Continuing our work on geosocial analysis we recently had a paper entitled “Cancer and Social Media: A Comparison of Traffic about Breast Cancer, Prostate Cancer, and Other Reproductive Cancers on Twitter and Instagram” published in the Journal of Health Communication. In the paper we  present a comparative study of differences in messaging for women’s and men’s cancer campaigns on social media through three discrete approaches. 
  1. we directly compare the incident rates of women’s and men’s cancers in the United States to the corresponding levels of traffic that these cancers elicited during World Cancer Day across two social media platforms, Twitter and Instagram. 
  2. we examine social media activity for breast cancer versus prostate cancer on both Twitter and Instagram during the dedicated month-long campaigns (October and November, respectively). 
  3. we compare the top terms associated with each campaign on these two social media platforms to discover whether there are differences in the terms associated with these online discussions.
Below you can read the abstract to our paper, see some of our results and at the bottom of the post have the full citation and link to the paper.

Abstract: 

Social media are often heralded as offering cancer campaigns new opportunities to reach the public. However, these campaigns may not be equally successful, depending on the nature of the campaign itself, the type of cancer being addressed, and the social media platform being examined. This study is the first to compare social media activity on Twitter and Instagram across three time periods: #WorldCancerDay in February, the annual month-long campaigns of National Breast Cancer Awareness Month (NBCAM) in October and Movember in November, and during the full year outside of these campaigns. Our results suggest that women’s reproductive cancers – especially breast cancer – tend to outperform men’s reproductive cancer – especially prostate cancer – across campaigns and social media platforms. Twitter overall generates substantially more activity than Instagram for both cancer campaigns, suggesting Instagram may be an untapped resource. However, the messaging for both campaigns tends to focus on awareness and support rather than on concrete actions and behaviors. We suggest health communication efforts need to focus on effective messaging and building engaged communities for cancer communication across social media platforms.

A comparison of percentages of cancer cases (green bars) and references to corresponding cancers in Twitter (blue bar) and Instagram (orange bar) during World Cancer Day 2016.

 References to breast cancer (green line), prostate cancer (orange line), and Movember (blue line) over the full year 2015 in Instagram.

Full Reference: 

Vraga, E., Stefanidis, A., Lamprianidis, G., Croitoru, A., Crooks, A.T. Delamater, P.L., Pfoser, D., Radzikowski, J. and Jacobsen, K.H. (2018), Cancer and Social Media: A Comparison of Traffic about Breast Cancer, Prostate Cancer, and Other Reproductive Cancers on Twitter and Instagram, Journal of Health Communication. 3(2), 181-189. (pdf)

Continue reading »

New Paper: Cancer and Social Media

Continuing our work on geosocial analysis we recently had a paper entitled “Cancer and Social Media: A Comparison of Traffic about Breast Cancer, Prostate Cancer, and Other Reproductive Cancers on Twitter and Instagram” published in the Journal of Health Communication. In the paper we  present a comparative study of differences in messaging for women’s and men’s cancer campaigns on social media through three discrete approaches. 
  1. we directly compare the incident rates of women’s and men’s cancers in the United States to the corresponding levels of traffic that these cancers elicited during World Cancer Day across two social media platforms, Twitter and Instagram. 
  2. we examine social media activity for breast cancer versus prostate cancer on both Twitter and Instagram during the dedicated month-long campaigns (October and November, respectively). 
  3. we compare the top terms associated with each campaign on these two social media platforms to discover whether there are differences in the terms associated with these online discussions.
Below you can read the abstract to our paper, see some of our results and at the bottom of the post have the full citation and link to the paper.

Abstract: 

Social media are often heralded as offering cancer campaigns new opportunities to reach the public. However, these campaigns may not be equally successful, depending on the nature of the campaign itself, the type of cancer being addressed, and the social media platform being examined. This study is the first to compare social media activity on Twitter and Instagram across three time periods: #WorldCancerDay in February, the annual month-long campaigns of National Breast Cancer Awareness Month (NBCAM) in October and Movember in November, and during the full year outside of these campaigns. Our results suggest that women’s reproductive cancers – especially breast cancer – tend to outperform men’s reproductive cancer – especially prostate cancer – across campaigns and social media platforms. Twitter overall generates substantially more activity than Instagram for both cancer campaigns, suggesting Instagram may be an untapped resource. However, the messaging for both campaigns tends to focus on awareness and support rather than on concrete actions and behaviors. We suggest health communication efforts need to focus on effective messaging and building engaged communities for cancer communication across social media platforms.

A comparison of percentages of cancer cases (green bars) and references to corresponding cancers in Twitter (blue bar) and Instagram (orange bar) during World Cancer Day 2016.

 References to breast cancer (green line), prostate cancer (orange line), and Movember (blue line) over the full year 2015 in Instagram.

Full Reference: 

Vraga, E., Stefanidis, A., Lamprianidis, G., Croitoru, A., Crooks, A.T. Delamater, P.L., Pfoser, D., Radzikowski, J. and Jacobsen, K.H. (2018), Cancer and Social Media: A Comparison of Traffic about Breast Cancer, Prostate Cancer, and Other Reproductive Cancers on Twitter and Instagram, Journal of Health Communication. 3(2), 181-189. (pdf)

Continue reading »

Zika in Twitter: Health Narratives

In the paper we explored how health narratives and event storylines pertaining to the recent Zika outbreak emerged in social media and how it related to news stories and actual events.

Specifically we combined actors (e.g. twitter uses), locations (e.g. where the tweets originated) and concepts (e.g. emerging narratives such as pregnancy) to gain insights on the mechanisms that drive participation, contributions, and interactions on social media  during a disease outbreak. Below you can read a summary of our paper along with some of the figures which highlight our methodology and findings.  

An overview of the Twitter narrative analysis approach, starting with data collection, and proceeding with preprocessing and data analysis to identify narrative events, which can be used to build an event storyline.

Abstract:
 

Background: The recent Zika outbreak witnessed the disease evolving from a regional health concern to a global epidemic. During this process, different communities across the globe became involved in Twitter, discussing the disease and key issues associated with it. This paper presents a study of this discussion in Twitter, at the nexus of location, actors, and concepts.

Objective: Our objective in this study was to demonstrate the significance of 3 types of events: location related, actor related, and concept- related for understanding how a public health emergency of international concern plays out in social media, and Twitter in particular. Accordingly, the study contributes to research efforts toward gaining insights on the mechanisms that drive participation, contributions, and interaction in this social media platform during a disease outbreak. 

Methods: We collected 6,249,626 tweets referring to the Zika outbreak over a period of 12 weeks early in the outbreak (December 2015 through March 2016). We analyzed this data corpus in terms of its geographical footprint, the actors participating in the discourse, and emerging concepts associated with the issue. Data were visualized and evaluated with spatiotemporal and network analysis tools to capture the evolution of interest on the topic and to reveal connections between locations, actors, and concepts in the form of interaction networks. 

Results: The spatiotemporal analysis of Twitter contributions reflects the spread of interest in Zika from its original hotspot in South America to North America and then across the globe. The Centers for Disease Control and World Health Organization had a prominent presence in social media discussions. Tweets about pregnancy and abortion increased as more information about this emerging infectious disease was presented to the public and public figures became involved in this. 

Conclusions: The results of this study show the utility of analyzing temporal variations in the analytic triad of locations, actors, and concepts. This contributes to advancing our understanding of social media discourse during a public health emergency of international concern.

Keywords: Zika Virus; Social Media; Twitter Messaging; Geographic Information Systems.

Spatiotemporal participation patterns and identifiable clusters over 4 of our twelve week study. The top left panel shows the data during the first week, and time progresses from left to right and from top to bottom towards .

Subsets of the full retweet network pertaining to the WHO (left) and CDC (right), and clusters identified within them. Magenta clusters are centered upon health entities, green upon news organizations, orange upon political entities.

Visualizing a narrative storyline across locations (blue), actors (red), and concepts (green).

Full Reference:

Stefanidis, A., Vraga, E., Lamprianidis, G., Radzikowski, J., Delamater, P.L., Jacobsen, K.H., Pfoser, D., Croitoru, A. and Crooks, A.T. (2017). “Zika in Twitter: Temporal Variations of Locations, Actors, and Concepts”, JMIR Public Health and Surveillance, 3 (2): e22. (pdf)

As normal, any feedback or comments are most welcome. 

Continue reading »

Zika in Twitter: Health Narratives

In the paper we explored how health narratives and event storylines pertaining to the recent Zika outbreak emerged in social media and how it related to news stories and actual events.

Specifically we combined actors (e.g. twitter uses), locations (e.g. where the tweets originated) and concepts (e.g. emerging narratives such as pregnancy) to gain insights on the mechanisms that drive participation, contributions, and interactions on social media  during a disease outbreak. Below you can read a summary of our paper along with some of the figures which highlight our methodology and findings.  

An overview of the Twitter narrative analysis approach, starting with data collection, and proceeding with preprocessing and data analysis to identify narrative events, which can be used to build an event storyline.

Abstract:
 

Background: The recent Zika outbreak witnessed the disease evolving from a regional health concern to a global epidemic. During this process, different communities across the globe became involved in Twitter, discussing the disease and key issues associated with it. This paper presents a study of this discussion in Twitter, at the nexus of location, actors, and concepts.

Objective: Our objective in this study was to demonstrate the significance of 3 types of events: location related, actor related, and concept- related for understanding how a public health emergency of international concern plays out in social media, and Twitter in particular. Accordingly, the study contributes to research efforts toward gaining insights on the mechanisms that drive participation, contributions, and interaction in this social media platform during a disease outbreak. 

Methods: We collected 6,249,626 tweets referring to the Zika outbreak over a period of 12 weeks early in the outbreak (December 2015 through March 2016). We analyzed this data corpus in terms of its geographical footprint, the actors participating in the discourse, and emerging concepts associated with the issue. Data were visualized and evaluated with spatiotemporal and network analysis tools to capture the evolution of interest on the topic and to reveal connections between locations, actors, and concepts in the form of interaction networks. 

Results: The spatiotemporal analysis of Twitter contributions reflects the spread of interest in Zika from its original hotspot in South America to North America and then across the globe. The Centers for Disease Control and World Health Organization had a prominent presence in social media discussions. Tweets about pregnancy and abortion increased as more information about this emerging infectious disease was presented to the public and public figures became involved in this. 

Conclusions: The results of this study show the utility of analyzing temporal variations in the analytic triad of locations, actors, and concepts. This contributes to advancing our understanding of social media discourse during a public health emergency of international concern.

Keywords: Zika Virus; Social Media; Twitter Messaging; Geographic Information Systems.

Spatiotemporal participation patterns and identifiable clusters over 4 of our twelve week study. The top left panel shows the data during the first week, and time progresses from left to right and from top to bottom towards .

Subsets of the full retweet network pertaining to the WHO (left) and CDC (right), and clusters identified within them. Magenta clusters are centered upon health entities, green upon news organizations, orange upon political entities.

Visualizing a narrative storyline across locations (blue), actors (red), and concepts (green).

Full Reference:

Stefanidis, A., Vraga, E., Lamprianidis, G., Radzikowski, J., Delamater, P.L., Jacobsen, K.H., Pfoser, D., Croitoru, A. and Crooks, A.T. (2017). “Zika in Twitter: Temporal Variations of Locations, Actors, and Concepts”, JMIR Public Health and Surveillance, 3 (2): e22. (pdf)

As normal, any feedback or comments are most welcome. 

Continue reading »

New Paper: User-Generated Big Data and Urban Morphology

Continuing our work with crowdsourcing and geosocial analysis we recently had a paper published in a special issue of the  Built Environment journal entitled “User-Generated Big Data and Urban Morphology.”
The theme of the special issue is: “Big Data and the City” which was guest edited by Mike Batty and includes 12 papers.  To quote from the website

“This cutting edge special issue responds to the latest digital revolution, setting out the state of the art of the new technologies around so-called Big Data, critically examining the hyperbole surrounding smartness and other claims, and relating it to age-old urban challenges. Big data is everywhere, largely generated by automated systems operating in real time that potentially tell us how cities are performing and changing. A product of the smart city, it is providing us with novel data sets that suggest ways in which we might plan better, and design more sustainable environments. The articles in this issue tell us how scientists and planners are using big data to better understand everything from new forms of mobility in transport systems to new uses of social media. Together, they reveal how visualization is fast becoming an integral part of developing a thorough understanding of our cities.”

Table of Contents

In our paper we discuss and show how crowdsourced data is leading to the emergence of alternate views of urban morphology that better capture the intricate nature of urban environments and their dynamics. Specifically how such data can provide us information pertaining to linked spaces and geosocial neighborhoods. We argue that a geosocial neighborhood is not defined by its administrative boundaries, planning zones, or physical barriers, but rather by its emergence as an organic self-organized social construct that is embedded in geographical spaces that are linked by human activity. Below is the abstract of the paper and some of the figures we have in it which showcase our work.

“Traditionally urban morphology has been the study of cities as human habitats through the analysis of their tangible, physical artefacts. Such artefacts are outcomes of complex social and economic forces, and their study is primarily driven by traditional modes of data collection (e.g. based on censuses, physical surveys, and mapping). The emergence of Web 2.0 and through its applications, platforms and mechanisms that foster user-generated contributions to be made, disseminated, and debated in cyberspace, is providing a new lens in the study of urban morphology. In this paper, we showcase ways in which user-generated ‘big data’ can be harvested and analyzed to generate snapshots and impressionistic views of the urban landscape in physical terms. We discuss and support through representative examples the potential of such analysis in revealing how urban spaces are perceived by the general public, establishing links between tangible artefacts and cyber-social elements. These links may be in the form of references to, observations about, or events that enrich and move beyond the traditional physical characteristics of various locations. This leads to the emergence of alternate views of urban morphology that better capture the intricate nature of urban environments and their dynamics.”

Keywords: Urban Morphology, Social Media, GeoSocial, Cities, Big Data.

City Infoscapes – Fusing Data from Physical (L1, L2), Social, Perceptual (L3) Spaces to Derive Place Abstractions (L4) for Different Locations (N1, N2).
Recreational Hotspots Composed of “Locals” and “Tourists” with Perceived Artifacts Indicating “Use” and “Need”. (A) High Line Park (B) Madison Square Garden.



Moving from Spatial Neighborhoods to Geosocial Neighborhoods via Links.

The Emergence of Geosocial Neighborhoods after the in the
Aftermath of the 2013 Boston Marathon Bombing

Full  Reference: 

Crooks, A.T., Croitoru, A., Jenkins, A., Mahabir, R., Agouris, P. and Stefanidis A. (2016). “User-Generated Big Data and Urban Morphology,”  Built Environment, 42 (3): 396-414. (pdf)

Continue reading »

New Paper: User-Generated Big Data and Urban Morphology

Continuing our work with crowdsourcing and geosocial analysis we recently had a paper published in a special issue of the  Built Environment journal entitled “User-Generated Big Data and Urban Morphology.”
The theme of the special issue is: “Big Data and the City” which was guest edited by Mike Batty and includes 12 papers.  To quote from the website

“This cutting edge special issue responds to the latest digital revolution, setting out the state of the art of the new technologies around so-called Big Data, critically examining the hyperbole surrounding smartness and other claims, and relating it to age-old urban challenges. Big data is everywhere, largely generated by automated systems operating in real time that potentially tell us how cities are performing and changing. A product of the smart city, it is providing us with novel data sets that suggest ways in which we might plan better, and design more sustainable environments. The articles in this issue tell us how scientists and planners are using big data to better understand everything from new forms of mobility in transport systems to new uses of social media. Together, they reveal how visualization is fast becoming an integral part of developing a thorough understanding of our cities.”

Table of Contents

In our paper we discuss and show how crowdsourced data is leading to the emergence of alternate views of urban morphology that better capture the intricate nature of urban environments and their dynamics. Specifically how such data can provide us information pertaining to linked spaces and geosocial neighborhoods. We argue that a geosocial neighborhood is not defined by its administrative boundaries, planning zones, or physical barriers, but rather by its emergence as an organic self-organized social construct that is embedded in geographical spaces that are linked by human activity. Below is the abstract of the paper and some of the figures we have in it which showcase our work.

“Traditionally urban morphology has been the study of cities as human habitats through the analysis of their tangible, physical artefacts. Such artefacts are outcomes of complex social and economic forces, and their study is primarily driven by traditional modes of data collection (e.g. based on censuses, physical surveys, and mapping). The emergence of Web 2.0 and through its applications, platforms and mechanisms that foster user-generated contributions to be made, disseminated, and debated in cyberspace, is providing a new lens in the study of urban morphology. In this paper, we showcase ways in which user-generated ‘big data’ can be harvested and analyzed to generate snapshots and impressionistic views of the urban landscape in physical terms. We discuss and support through representative examples the potential of such analysis in revealing how urban spaces are perceived by the general public, establishing links between tangible artefacts and cyber-social elements. These links may be in the form of references to, observations about, or events that enrich and move beyond the traditional physical characteristics of various locations. This leads to the emergence of alternate views of urban morphology that better capture the intricate nature of urban environments and their dynamics.”

Keywords: Urban Morphology, Social Media, GeoSocial, Cities, Big Data.

City Infoscapes – Fusing Data from Physical (L1, L2), Social, Perceptual (L3) Spaces to Derive Place Abstractions (L4) for Different Locations (N1, N2).
Recreational Hotspots Composed of “Locals” and “Tourists” with Perceived Artifacts Indicating “Use” and “Need”. (A) High Line Park (B) Madison Square Garden.



Moving from Spatial Neighborhoods to Geosocial Neighborhoods via Links.

The Emergence of Geosocial Neighborhoods after the in the
Aftermath of the 2013 Boston Marathon Bombing

Full  Reference: 

Crooks, A.T., Croitoru, A., Jenkins, A., Mahabir, R., Agouris, P. and Stefanidis A. (2016). “User-Generated Big Data and Urban Morphology,”  Built Environment, 42 (3): 396-414. (pdf)

Continue reading »

Summer Projects

Over the summer, Arie Croitoru and myself took part in the George Mason University Aspiring Scientists Summer Internship Program. We worked with three very talented high-school students who over the course of the seven and a half week program produced some excellent research around the areas of agent-based modeling and social media analysis. An overview of their work can be seen in the posters and abstracts that the students produced at the end of the internship.
Lawrence Wang explored how social media could be used with respect to predicting election results under a project entitled “And the Winner Is? Predicting Election Results using Social Media”. Below you can read Lawrence’s abstract and see his poster.

“The 2012 U.S. presidential election demonstrated how Twitter can serve as a widely accessible forum of political discourse. Recently, researchers have investigated whether social media, particularly Twitter, can function as a predictive tool. In the past decade, multiple studies have claimed to successfully predict the results of elections using Twitter data. However, many of these studies fail to account for the inherent population bias present in Twitter data, leading to ungeneralizable results. In this project, I investigate the prospects of using Twitter data as an alternative to poll data for predicting the 2012 presidential election. The tweet corpus consisted of tweets published one month before the November election day. Using VADER, a sentiment analysis tool, I analyzed over 140,000 tweets for political sentiment. I attempted to circumvent the Twitter population bias by comparing age, race, and gender metrics of the Twitter population with that of the U.S. population. Furthermore, I utilized Bayesian inference with prior distributions from the results of the 2008 presidential election in order to mitigate the effects of limited tweet data in certain states. The resulting model correctly predicted the likely outcomes of 46 of the 50 states and predicted that President Obama would be reelected with a probability of 0.945. Such a model could be used to explore the forthcoming elections. ” 

In a second project, Varun Talwar, explored how knowledge bases could be utilized to better contextualize social media discussions with a project entitled “Context Graphs: A Knowledge-Driven Model for Contextualizing Twitter Discourse.” Below you can read Varun’s project abstract and his end of project poster.

Introduction: User posted content through online social media (SM) platforms in recent years has emerged as a rich field for narrative analysis of topics captured during the discussion discourse. In particular, collective discourse has been used to manually contextualize public perception of health related events.

Objective: As SM feeds tend to be noisy, automated detection of the context of a given SM discourse stream has proven to be a challenging task. The primary objective of this research is to explore how existing knowledge bases could be utilized to better contextualize SM discussions through topic modeling and mining. By utilizing such existing knowledge it would then be possible to explore to what extent a given discourse is related to a known or a new context, as well as compare and contrast SM discussions through their respective contexts.

Methods: In order to accomplish these goals this research proposes a novel approach for contextualizing SM discourse. In this approach, topic modeling is combined with a knowledgebase in a two-step process. First, key topics are extracted from a SM data corpus by applying a statistical topic-modeling algorithm, a process that also results in data dimensionality reduction. Once a set of salient topics are extracted, each topic is then used to mine the knowledge base for sub graphs that represent the contextual linkages between knowledge elements. Such sub-graphs can then further disambiguate the topic modeling results, and be utilized for qualifying context similarity across SM discussions.

Results: The time-series analysis of the Twitter discourse via graph-matching algorithms reveals the change in topics as evidenced by the emergence of the terms “pregnancy” and “abortion” as information about the virus propagated through the Twitter community. “

Elizabeth Hu explored the current migration crisis in Europe in a project entitled “Across the Sea: A Novel Agent-Based Model for the Migratory Patterns of the European Refugee Crisis”. Below is Elizabeth’s abstract, poster and an example model run.

“Since 2010, a growing number of refugees have sought asylum in European nations, fleeing violence and military conflict in their home countries. Most of the refugees originate from Syria, Iraq, Afghanistan, and African nations. The vast majority of refugees risk their lives in the popular yet perilous Mediterranean Sea Route often prone to boat accidents and subsequent deaths of migrants.  The flow of millions of refugees has introduced a humanitarian crisis not seen since World War II. European nations are struggling to cope with the influx of refugees through various border policies.

In order to explore this crisis, a geographically explicit agent-based model has been developed to study the past and future patterns of refugee flows. Traditional migration models, which represent the population as an aggregate, fail to consider individual decision-making processes based on personal status and intervening opportunities. However, the novel agent-based model developed here of migration allows population behavior to emerge as the result of individual decisions. Initial population, city, and route attributes are based upon data from the UNHCR, EU agencies, crowd-sourced databases, and news articles. The agents, refugees, select goal destinations in accordance with the Law of Intervening Opportunities. Thus, goals are prone to change with fluctuating personal needs. Agents choose routes not only based on distance, but also other relevant route attributes. The resulting migration flows generated by the model under various circumstances could provide crucial guidance for policy and humanitarian aid decisions.”

The movie below gives a sense of the migration paths the refugees are taking.

Continue reading »

Summer Projects

Over the summer, Arie Croitoru and myself took part in the George Mason University Aspiring Scientists Summer Internship Program. We worked with three very talented high-school students who over the course of the seven and a half week program produced some excellent research around the areas of agent-based modeling and social media analysis. An overview of their work can be seen in the posters and abstracts that the students produced at the end of the internship.
Lawrence Wang explored how social media could be used with respect to predicting election results under a project entitled “And the Winner Is? Predicting Election Results using Social Media”. Below you can read Lawrence’s abstract and see his poster.

“The 2012 U.S. presidential election demonstrated how Twitter can serve as a widely accessible forum of political discourse. Recently, researchers have investigated whether social media, particularly Twitter, can function as a predictive tool. In the past decade, multiple studies have claimed to successfully predict the results of elections using Twitter data. However, many of these studies fail to account for the inherent population bias present in Twitter data, leading to ungeneralizable results. In this project, I investigate the prospects of using Twitter data as an alternative to poll data for predicting the 2012 presidential election. The tweet corpus consisted of tweets published one month before the November election day. Using VADER, a sentiment analysis tool, I analyzed over 140,000 tweets for political sentiment. I attempted to circumvent the Twitter population bias by comparing age, race, and gender metrics of the Twitter population with that of the U.S. population. Furthermore, I utilized Bayesian inference with prior distributions from the results of the 2008 presidential election in order to mitigate the effects of limited tweet data in certain states. The resulting model correctly predicted the likely outcomes of 46 of the 50 states and predicted that President Obama would be reelected with a probability of 0.945. Such a model could be used to explore the forthcoming elections. ” 

In a second project, Varun Talwar, explored how knowledge bases could be utilized to better contextualize social media discussions with a project entitled “Context Graphs: A Knowledge-Driven Model for Contextualizing Twitter Discourse.” Below you can read Varun’s project abstract and his end of project poster.

Introduction: User posted content through online social media (SM) platforms in recent years has emerged as a rich field for narrative analysis of topics captured during the discussion discourse. In particular, collective discourse has been used to manually contextualize public perception of health related events.

Objective: As SM feeds tend to be noisy, automated detection of the context of a given SM discourse stream has proven to be a challenging task. The primary objective of this research is to explore how existing knowledge bases could be utilized to better contextualize SM discussions through topic modeling and mining. By utilizing such existing knowledge it would then be possible to explore to what extent a given discourse is related to a known or a new context, as well as compare and contrast SM discussions through their respective contexts.

Methods: In order to accomplish these goals this research proposes a novel approach for contextualizing SM discourse. In this approach, topic modeling is combined with a knowledgebase in a two-step process. First, key topics are extracted from a SM data corpus by applying a statistical topic-modeling algorithm, a process that also results in data dimensionality reduction. Once a set of salient topics are extracted, each topic is then used to mine the knowledge base for sub graphs that represent the contextual linkages between knowledge elements. Such sub-graphs can then further disambiguate the topic modeling results, and be utilized for qualifying context similarity across SM discussions.

Results: The time-series analysis of the Twitter discourse via graph-matching algorithms reveals the change in topics as evidenced by the emergence of the terms “pregnancy” and “abortion” as information about the virus propagated through the Twitter community. “

Elizabeth Hu explored the current migration crisis in Europe in a project entitled “Across the Sea: A Novel Agent-Based Model for the Migratory Patterns of the European Refugee Crisis”. Below is Elizabeth’s abstract, poster and an example model run.

“Since 2010, a growing number of refugees have sought asylum in European nations, fleeing violence and military conflict in their home countries. Most of the refugees originate from Syria, Iraq, Afghanistan, and African nations. The vast majority of refugees risk their lives in the popular yet perilous Mediterranean Sea Route often prone to boat accidents and subsequent deaths of migrants.  The flow of millions of refugees has introduced a humanitarian crisis not seen since World War II. European nations are struggling to cope with the influx of refugees through various border policies.

In order to explore this crisis, a geographically explicit agent-based model has been developed to study the past and future patterns of refugee flows. Traditional migration models, which represent the population as an aggregate, fail to consider individual decision-making processes based on personal status and intervening opportunities. However, the novel agent-based model developed here of migration allows population behavior to emerge as the result of individual decisions. Initial population, city, and route attributes are based upon data from the UNHCR, EU agencies, crowd-sourced databases, and news articles. The agents, refugees, select goal destinations in accordance with the Law of Intervening Opportunities. Thus, goals are prone to change with fluctuating personal needs. Agents choose routes not only based on distance, but also other relevant route attributes. The resulting migration flows generated by the model under various circumstances could provide crucial guidance for policy and humanitarian aid decisions.”

The movie below gives a sense of the migration paths the refugees are taking.

Continue reading »

Megacities through the Lens of Social Media

Megacities, which can be roughly defined as cities with a population of over 10 million people are on the increase due to ongoing urbanization trends. The United Nations notes that since the 1970’s the number of megacities has more than tripled (from 8 to 34), and is expected to further double until 2050 (to exceed 60).

The question we are wondering is how can GeoSocial analysis help understand such cities. To this end, we have recently had a paper published  entitled: “Megacities: Through the Lens of Social Media” in the Journal of the Homeland Defense and Security Information Analysis Center (HDIAC). In the paper we discuss opportunities and challenges that social media brings with respect to understanding the physical and cyber spaces within megacities. Below you can see the synopsis to our paper.

Due to ongoing urbanization trends the worldwide urban population is projected to grow from half of the global population (today) to two thirds of it by 2030. Almost all the new megacities that will emerge through this process are in geopolitical hotspots of southeast Asia and sub-Saharan Africa. Therefore, the U.S. Department of Defense must consider the challenges presented by engagement in such environments when planning for the future. The physical challenge of operating in such dense, highly three-dimensional, environments is only compounded by the added challenge presented by the advanced functional complexity of these environments: megacities function at the intersection of the physical, social, and cyber spaces. Accordingly, military operations in these locations must prepare to engage in environments where news, ideas, and opinions are shaped in cyberspace and propagated across the physical urban landscape. As social networks connect (or, often, divide) populations they form communities and facilitate their mobilization.

We have observed these processes time and again, from the streets of Cairo during the Arab Spring, to the streets of Tokyo during the Fukushima nuclear disaster, and the streets of Paris during the recent ISIL terrorist attacks. Advancing our capability to analyze crowd-generated content in the form of social media feeds is a substantial scientific challenge with considerable implications for future DoD operations. In this publication, we use representative examples to demonstrate the opportunities and challenges associated with such information, especially as they relate to large urban areas. 

An emerging framework to study urban systems.

Social networks embedded within a geographical content, leading to connected, non-contiguous areas.

Full Reference: 

Stefanidis, A., Jenkins A., Croitoru, A. and Crooks, A. (2016). “Megacities Through the Lens of Social Media”, Journal of the Homeland Defense & Security Information Analysis Center (HDIAC), 3(1): 24-29. (pdf)

Continue reading »

Megacities through the Lens of Social Media

Megacities, which can be roughly defined as cities with a population of over 10 million people are on the increase due to ongoing urbanization trends. The United Nations notes that since the 1970’s the number of megacities has more than tripled (from 8 to 34), and is expected to further double until 2050 (to exceed 60).

The question we are wondering is how can GeoSocial analysis help understand such cities. To this end, we have recently had a paper published  entitled: “Megacities: Through the Lens of Social Media” in the Journal of the Homeland Defense and Security Information Analysis Center (HDIAC). In the paper we discuss opportunities and challenges that social media brings with respect to understanding the physical and cyber spaces within megacities. Below you can see the synopsis to our paper.

Due to ongoing urbanization trends the worldwide urban population is projected to grow from half of the global population (today) to two thirds of it by 2030. Almost all the new megacities that will emerge through this process are in geopolitical hotspots of southeast Asia and sub-Saharan Africa. Therefore, the U.S. Department of Defense must consider the challenges presented by engagement in such environments when planning for the future. The physical challenge of operating in such dense, highly three-dimensional, environments is only compounded by the added challenge presented by the advanced functional complexity of these environments: megacities function at the intersection of the physical, social, and cyber spaces. Accordingly, military operations in these locations must prepare to engage in environments where news, ideas, and opinions are shaped in cyberspace and propagated across the physical urban landscape. As social networks connect (or, often, divide) populations they form communities and facilitate their mobilization.

We have observed these processes time and again, from the streets of Cairo during the Arab Spring, to the streets of Tokyo during the Fukushima nuclear disaster, and the streets of Paris during the recent ISIL terrorist attacks. Advancing our capability to analyze crowd-generated content in the form of social media feeds is a substantial scientific challenge with considerable implications for future DoD operations. In this publication, we use representative examples to demonstrate the opportunities and challenges associated with such information, especially as they relate to large urban areas. 

An emerging framework to study urban systems.

Social networks embedded within a geographical content, leading to connected, non-contiguous areas.

Full Reference: 

Stefanidis, A., Jenkins A., Croitoru, A. and Crooks, A. (2016). “Megacities Through the Lens of Social Media”, Journal of the Homeland Defense & Security Information Analysis Center (HDIAC), 3(1): 24-29. (pdf)

Continue reading »
1 2