PhD Opportunities

I’m delighted to announce two funded PhD opportunities at University College London with Kantar Worldpanel (@K_Worldpanel) and Arup. Successful candidates will join a cohort of students at the UBEL Doctoral Training Centre, become part of the team of researchers at based the Consumer Data Research Centre and UCL Geospatial Analytics and Computing Research Group. These are innovative projects with two leading companies at the cutting edge of data science.

We are looking for applicants with backgrounds in quantitative social science and related disciplines, such as geography, social statistics, political science, economics, applied mathematics, computer science, planning, psychology or sociology.

The studentships will cover

  • Tuition fees per year – for either three years (Ph.D. only) or 1+3 years (including a preparatory year’s Masters study in a quantitative social science course).
  • Annual maintenance stipend full-time: the stipend for 2017/18 was £16,553.

I’ve pasted the full adverts below from the CDRC website. Please read carefully, check you’re elligible and feel free to get in touch if you have any questions.

Note applications need to be sent to the Consumer Data Research Centre’s Project Manager, Sarah Sheppard (s.sheppard@ucl.ac.uk) by 4th February 2018. Good luck!

ESRC UCL, Bloomsbury, East London Doctoral Training Partnership Co-funded PhD studentships at the Consumer Data Research Centre (CDRC)

The Consumer Data Research Centre has two co-funded PhD studentships in quantitative social science based in UCL’s Department of Geography. The awards will be administered through the UBEL Doctoral Training Partnership. Projects are available working with ARUP and Kantar Worldpanel, commencing September 2018

These awards are open to applicants with backgrounds in quantitative social science and related disciplines, such as geography, social statistics, political science, economics, applied mathematics, computer science, planning, psychology or sociology. Students will be expected to work with consumer data as part of an exciting multidisciplinary research centre.

The studentships will cover

  • Tuition fees per year – for either three years (Ph.D. only) or 1+3 years (including a preparatory year’s Masters study in a quantitative social science course).
  • Annual maintenance stipend full-time: the stipend for 2017/18 was £16,553.

If you are interested in applying, please:

  1. Ascertain your eligibility to hold an ESRC studentship here. Note that full awards are intended for students ordinarily resident in the UK although, exceptionally, overseas students with strong backgrounds in advanced quantitative methods may be eligible.
  2. Ascertain your research training foundation. If you hold or expect to obtain a relevant MSc with methods training meeting the 2015 ESRC Postgraduate Training Guidelines, you may apply for a +3 studentship. If you do not, you will need to take one of the related MSc courses at University College London, examples detailed here.  If following this route we will discuss with you the most appropriate course to apply for.
  3. Please apply to the Consumer Data Research Centre’s Project Manager, Sarah Sheppard (s.sheppard@ucl.ac.uk) by 4th February 2018. Please send:
  • Max 500 word covering email summarising your interest in pursuing a particular co-funded PhD studentship with the CDRC.
  • Academic CV including marks awarded to date plus details of 2 referees

    Please note that only strong candidates (at least 2.1/Merit with elements of first/distinction level) will be considered.

Details of the two projects appear below.

CO-FUNDED PROJECTS

New forms of data for urban modelling

Industry Partner: ARUP
First Supervisor: Dr James Cheshire
Second Supervisor: Professor Paul Longley
Industry Supervisor: Dr Tom Heath

There is now an abundance of data capturing human behaviour within cities. These data are drawn from a range of near real-time sources that include travel card usage, social media postings and traffic sensors. As data availability and the sophistication so called “smart city” technology continues to increase, the analytical approaches deployed for extracting information are lagging behind. This is largely due to the dependence on single, static, databases that are often poorly suited to dynamic data collected in real time. Whilst it is true that researchers are capturing urban dynamics at a more granular temporal scale than ever before, it can be argued that no scalable platforms developed within the social sciences have been able to harness the full potential of real-time data generated by smart city technology. Data are produced from various sources in different forms that make linkage sometimes impossible, while techniques for trend detection in real-time are still fairly rudimentary. This reflects both a skills shortage of appropriately trained researchers as well as the requirement for new computational techniques such as those from machine learning and artificial intelligence developed within statistics or computer science.

These computational advances have enabled two distinct but complementary capabilities in the context of urban modelling. Firstly, new sensor data has either enabled the quantification of existing metrics in significantly finer spatial and temporal resolution (e.g. GPS journey time data) or enabled the measuring of things which were previously unknown (e.g. social media sentiment data). Secondly, analytical advances, in both computation and methods, mean that analyses may be broadened to include more factors and in significantly more complex ways. This has enabled the generation of new heuristics to either a) improve existing models with better data; or b) to derive new models with data-driven heuristics which combine historically separate or non-existent data sources.

It is these analytical advances that will benefit most from insights from the social sciences and, as such, form the focus of the proposed PhD project. The over-arching aim of the research, therefore, is to assess the utility of new forms of data for improved urban models. Specifically the focus of the research will be on the creation of robust heuristics for the cleaning and analysis of a range of real-time data feeds in order that the resulting data can be used to both improve and cross-validate widely adopted urban transportation models. It is anticipated that the project’s findings will enable more evidence-based decisions to be made on urban issues, particularly mobility, in order to enable more effective societal outcomes.

This will be achieved through the following objectives:

  1. Refine the historically heavily aggregated supply side of transportation data via new data sources (e.g. real-time feeds from TfL)
  2. Expand the range of considerations towards more consumer focussed sources to better capture the likely demand for urban transport.
  3. Merge these finer spatial and temporal resolution supply and demand side data sources with new innovative dynamic models with a particular focus on agent based modelling and cellular automata.

This project will benefit from a range of innovative datasets. On the supply side ARUP will provide value added data derived from a range of open sources e.g. TfL and other transit operators/authorities globally, whilst on the demand side CDRC data holdings will supply and augment data from relevant commercial partners.

The timeline for the studentship will be as follows (following successful completion of M.Sc. Geospatial Analysis or similar UBEL pathway if a 1+3 student):

Months          Activities

1-2              Inductions at UCL and ARUP offices (London). Formalisation of supervisory arrangements; initial scoping of project; secure data lab training; ethical review

3-9              Research literature search; audit of ARUP data and CDRC consumer datasets; training as required in programming and database management skills; initial exploration of databases including computer mapping

9-12            Upgrade report preparation and examination (involving all supervisors); first work placement at ARUP

13-20           Detailed work on enrichment of ARUP data with other consumer datasets and
administrative/census data. Focus on development of new urban models and data-driven heuristics; second work placement

21-30           Advanced analysis and creation of summary indicators; a further conference presentation; a third ARUP work placement

31-36           Completion of thesis write up.

The research undertaken as part of this Ph.D. will generate both methodological insights and new forms of data that will be beneficial to consumer data research as well as to the industrial partner. It will offer insights into the utility of feeding disaggregate, real-time, datasets into conventional urban modelling frameworks such as agent based modelling. Predictions from such models underpin a large number of decisions in urban planning. Their improvement through the better utilisation of new forms of urban data will therefore have widespread societal consequences as well as alter the commercial practices at ARUP.

 

Creating a synthetic panel to allocate the total grocery market volume to locations, occasions, and individuals

Industry Partner: Kantar Worldpanel
First Supervisor: Dr James Cheshire
Second Supervisor: Professor Paul Longley
Industry Supervisor: Dr Gareth Hagger-Johnson

A substantial – and increasing – range of human activity is now being captured and stored digitally. These data are obtained from sources that include government administrative records, commercial transactions, Internet usage and smartphones. All benefit from technological innovations that facilitate their collection, storage and analysis and they contribute to “Big Data” available for social science research but they are often unrepresentative – or their representativeness is hard to quantify. Traditional sources of data such as social surveys are less likely to suffer from these limitations but are increasingly costly to administer and suffer high non-response rates where there are little incentives to complete questionnaires. Kantar Worldpanel (Kantar) is one of the few commercial organisations that is successfully managing a range of continuous consumer panels in many countries. In the UK, the panels cover fast-moving consumer goods (FMCG), personal care, telecoms, fashion, entertainment and petrol, among many others. Shoppers’ behaviour is recorded, and this can be supplemented with questionnaires to record attitudes or other variables not already captured.

The focus of the proposed project will be the UK. Here Kantar operates a panel of households who provide data on FMCG purchased for in-home (INHP). There are three nested panels within this large panel, covering out of home purchasing (OOHP), out of home usage (OOHU), and in-home usage (INHU). Each of these is separately weighted to show the total purchasing, representative of the GB population. Weights reflect socio-demographic and behavioural variables known to influence under-reporting. Current weighting approaches are applied in a similar way to flagship research council funded academic panels (for example the ESRC’s Understanding Society survey, which is also supported by Kantar).

Whilst these data provide a robust basis for consumer research, a key challenge is the limited overlap between INHU and OOHU. To help address this, Kantar also have Shoppix – a new mobile application panel of individuals covering all purchasing both in and out of home from a wide range of retailers, not just those in grocery.

Currently there is no way to calculate the total market (INHP + OOHP), nor assign total volume of grocery to specific individuals and occasions of usage (INHU-OUHU) since each of the four scenarios is considered separately. Addressing this limitation raises interesting methodological challenges associated with data linkage, representativeness and will help to realise the potential of data from the likes of Kantar to be used in a wider range of social science research. In addition, Kantar envisage clear commercial benefits to being able to do this. Therefore, the overarching aim of the research is to create one synthetic data cube, representing total volume that can be allocated to in/out home, each individual, and each occasion. The cube should be weighted, or impute missing records, in a way that is robust so that conclusions are representative of the broader population.

The Ph.D. will focus on the application and development of a range of data linkage methodologies alongside the creation of a series of heuristics to establish the representativeness of the data outputs to the broader population. Most attention will be given to four methodological approaches:

  1. Linkage – we anticipate this could be achieved by synthetic data linkage, given that much of the data from each panel comes from different individuals. The synthetic linkage could occur by parcelling time-invariant population characteristics (e.g. males, born 1980, low social class) and matching to other panels on the same characteristics. There is a trade-off between increasing the number of subgroups, to get a more precise match, and statistical power when cell sizes can become small.
  2. Weighting – how should we weight the data cube to ensure it is representative?
  3. Imputation – an alternative approach to weighting is to treat the problem as a missing data problem, and imputing more data to improve representativeness via a series of regression models.
  4. Modelling and microsimulation – probabilistic modelling approaches could also be used to assign consumption volume to its most likely individual and consumption occasion.

 

These approaches require careful consideration and the student will be supported both by the academic supervisory team and Kantar’s in-house researchers.

 

The timeline for the studentship will be as follows (following successful completion of M.Sc. Geospatial Analysis or similar UBEL pathway if a 1+3 student):

Months          Activities

1-2              Inductions at UCL and Kantar Worldpanel offices (London). Formalisation of supervisory arrangements; initial scoping of project; secure data lab training; ethical review

3-9              Research literature search; audit of panel data and CDRC consumer datasets; training as required in programming and database management skills; initial exploration of databases including computer mapping

9-12            Upgrade report preparation and examination (involving all supervisors); first work placement at Kantar Worldpanel

13-20           Detailed work on concatenation and conflation of Kantar data with other consumer datasets and administrative/ census data. Preparation of two conference papers/papers for publication. Second work placement.

21-30           Advanced analysis and creation of summary indicators to from data cube; a further conference presentation; a third Kantar Worldpanel work placement

31-36           Completion of thesis write up.

It is anticipated that the work will be of high impact in a number of ways. Firstly, it offers the chance to generate more accurate measures of consumption – essential information for both commerce and social science research. These measures will be the product of significant methodological developments that can be applied to existing data holdings within the ESRC’s Consumer Data Research Centre and also may be of interest to initiatives spearheaded by the Office of National Statistics to derive official statistics from a broader range of consumer datasets.  Finally, the outputs will be of significant interest to retailers and businesses with interests in store location planning and the provision of facilities to improve customer experience of multi-channel retailing.

This is a fast-moving area of research so we are keen to recruit a student who is adaptable and able to change focus if new avenues of interest present themselves. The student should also be familiar with a programming language, with experience using R, Python and SQL preferred.