My Academic Research: What’s in a Name?

I have spent the last few years investigating the geography of family names (also called surnames). I work with the team who assembled the UCL Department of Geography Worldnames Database that contains the names and geographic locations of over 300 million people in nearly 30 countries (a few of these are yet to be added to the website). My research has focussed on the 152 million or so people we have data for in Europe and they all come from publicly available telephone directories or electoral rolls. I also had access to a historical dataset for Great Britain in the form of the 1881 census.  I have tried to answer two questions:

1. Is it possible to approximately establish the origin of a surname based on its modern day geographic distribution?

2. Are particular surnames more likely to be found together and if so do they form distinct geographic regions?

In the past surname research has involved  lot of manual work to create a detailed history of a particular name. With so many surnames in the database I had to think of some automated ways to do this computationally. The patterns I produce are much more generalised than the manual work- I find broad patterns rather than specific genealogical facts- but they provide useful context for population genetics, migration, historical geography and demography. If you want to find out more about this research here are titles for the papers I have had published in academic journals:

The Surname Regions of Great Britain.

Creating a Regional Geography of Great Britain Through the Spatial Analysis of Surnames.

Identifying Spatial Concentrations of Surnames.

People of the British Isles: A Preliminary Analysis of Genotypes and Surnames in a UK Control Population.

Delineating Europe’s Cultural Regions: Population Structure and Surname Clustering.

For a full list see my UCL academic profile. The left map at the top of the post is from the last paper I listed and shows how the surname regions vary across Europe. The map on the right shows how confident I am of the regions based on the number of times they emerge in the cluster analysis.