Weighted Mean Direction Surfaces in Python

I work a lot with flows and spatial interactions, one thing that I’ve wanted to do for a while is compute a mean flow direction surface. Unfortunately, arithmetic means don’t work for angular data, this is because it cannot account for the circular nature of the distribution of angular measurements. For instance the angles 5 degrees and 355 degrees are seperated only by 10 degrees, but their arithmetic mean is 180 degrees -w ay off, it should be 0 degrees!

Luckily, Brunsdon and Charlton have published on this very subject, so I took it upon myself to implement a weighted circular mean function in Python. The key obstacle was learning about complex numbers, about which, up until this point, I had no idea about at all!

The first thing to do is calculate the angle between a set of candidate points (such as people) and a set of services (such as Medical Centres). This is simple enough to do using, and would look something like:

import math
math.atan2((y2-y1),(x2-x1))

In which the pair (x1,y1) is the location of the candidate point, and (x2,y2) the location of the allocated service for that candidate point. The line linking these two points defines a flow from a candidate point, to a servcie and vice versa.

Having calculated all of the angles, I used ArcGIS to create an output grid, at the extent of the candidate points, using the “fishnet” function which creates a vector grid of prespecified dimensions.

The beauty of Brunsdon and Charlton’s method is that it uses a local method of approximation, this means that for each cell in the output grid, a mean direction can be calculated based upon the values of nearby points, applying a weighting allows for more distance points to have less of an effect on the mean direction.

Firstly, I read all the candidate points into a KDTree structure, this allows me to search for local points, at the same time I also create an array of the angles for those candidate points.

from scipy.spatial import cKDTree
import numpy as np

tree = cKDTree(treepoints)
res, idx = tree.query(testpoint,300000,0,2,100)
res = res[0][np.where(res[0] < np.Inf)[0]]
idx = idx[0][:len(res)]

The tree takes a numpy array of coordinate pairs, and the query method returns an array of distances to points (res) and their index value in the original array of coordinates (idx). The testpoint is a cell in the vector grid; 300000 is the k-number of nearest neighbours to find, here I have simply set it arbitrarily high in the context of my dataset; 0 is for approximate nearest neighbours, here I’ve specified exact; 2 indicates the use of euclidian distance; and 100 is the threshold, neighbours won’t be returned if they are further than 100 metres away. The penultimate line simply returns an array that is shortened to just those values which are less than 100m away (i.e. less than infinity) – points over 100m away are returned as value Inf.

The next step is to actually compute the mean direction, this requires a special approach using complex numbers however. Brunsdon and Charlton show that a direction can be stated as a complex number z in which z = exp(iθ) this is effectively: z = cos(θ) + i sin(θ)  in which i is an imaginary number. We can restate our directions in Python using:

import cmath

thetas = angles[idx]
cThetas = []
for i in xrange(0,len(thetas)):
    cThetas.append(complex(np.cos(thetas[i]),np.sin(thetas[i])))
cThetas = np.array(cThetas)

Here, the complex function allows the complex number representing an angle to be stored in a list, which I convert (lazily) to a numpy array. The first term, thetas, is using the idx array from the cKDTree to cleverly index the relevant angle records from the angles array which stores all the angle values in the order of entries for the cKDTree.

Next a temporary variable is created which calculates the mean direction:

temp = np.sum(cThetas)/np.absolute(np.sum(cThetas))
MeanDir = np.angle(temp, deg = True)

The mean direction is given by the argument (Arg) of the resultant complex number, Python implements this with the np.angle function, where deg = True returns the angle in degrees, and False in radians.

So far this is the unweighted mean, aggregating directional observations within a 100m disk (see also: uniform disk smoothing). To introduce weighting we must first define a weighting scheme, I’ve used the one suggested by Brunsdon and Charlton, which is Gaussian, and might look at bit like this:

def gaussW(dists,band):
    out = np.zeros(dists.shape)
    for i in xrange(0,len(out)):
        temp = np.power(dists[i],2)/(2.0*np.power(float(band),2))
        out[i] = np.exp(-1.0 * temp)
    return out

weight = gaussW(res,100)

Quite simply, I pass the distance array res to the gaussW function and it gives me back an array of weights for that ordering of distances. Using this I can redo the mean direction thus:

temp = np.sum(weight*cThetas)/np.absolute(np.sum(weight*cThetas))
MeanDir = np.angle(temp, deg = True)

There you have it! Attached is the script I used. Obviously, Brunsdon and Charlton implement a variance and a couple of visualisation devices, but these should be simple enough to implement now!

I created an output for flows of patients to GPs in Southwark, visualised using one of ESRI’s circular/direction colour ramps from colour ramp pack 2. Not sure how best to visualise the legend at this point though. NB. 90 is north, -90 is South, 0/-0 is East and 180/-180 is West. The map is visualised to show the 4 cardinal directions, but the output is in fact continuous.

My example script is here. Note that I am using dbfpy to read and write to shapefile DBF tables directly.