Why your friends are probably more popular than you

I came across an interesting article by John Allen Paulos (via @timharford) on Scientific American claiming that your friends are probably more popular than you.

This idea goes nicely together with some of the theory behind complex networks which I learnt last week in Boston at the NECSI winter school on complex systems. I thought it might be an idea to see if I could explain some of the maths behind this claim.

In a social network each person is called a ‘node’ and they are connected to each of their friends by an ‘edge’. Let’s chose someone at random in our social network, Jeff say, and suppose Jeff has j friends. This means that Jeff will have j ‘edges’ connecting himself to each of his j friends. Let’s pick one of these edges at random and try to figure out the number of friends that Jeff’s random friend has.

To do this, we first need to define a couple of things…

– Let p(k) be the probability that a person picked at random has k friends

– Let n represent the total number of people in the network

– Let <k> be the average number of friends that a person in our network has

We now need to think about the network a little more generally…

The total number of edges connected to somebody with just one friend is equal to the total number of people with one friend (since each of these only have one edge). The total number of edge connected to somebody with two friends is equal to the total number of people with two friends, multiplied by two (since each of these people have two edges). More generally, the total number of people connected to somebody with k friends is equal to the total number of people with k friends multiplied by k. But this is equal to p(k) n k since p(k) n is the total number of people with k friends (the probability of a given person having k friends multiplied by the total number of people in the network).

The probability that any edge is connected to a person with k friends is then the total number of edges connected to nodes with k friends divided by the total number of edges in the whole network. But this is just p(k) n k / n <k> since <k> is the average number of edges each node has and is calculated by the total number of edges in the network divided by the number of nodes in the network (i.e. n) and so the number of edges is n <k>.

So the probability that a given edge is connected to a node of degree k is p(k) k/<k>.

Now let’s get back to Jeff:

Suppose we have chosen one of his friends at random or, more specifically, we have chosen to follow one of his ‘edges’ at random which will lead us to one of his friends. The probability that Jeff’s friend has degree k is then p(k) k/<k>.

The expected number of friends (in a statistical sense) that Jeff’s random friend has is then:

\sum_{k=1} ^{\infty} k \frac{p(k) k}{<k>} = \frac{\sum_{k=1}^{\infty} p(k) k^2}{<k>}

The value of <k> can be calculated more explicitly as \sum_{k=1}^{\infty} p(k) k i.e. the expected mean. So the number of friends that we can expect Jeff’s random friend to have is:

\frac{\sum_{k=1}^{\infty} p(k)k^2}{\sum_{k=1}^{\infty} p(k)k}

Now is the point that I have to provide a little extra background. Social networks are often described as scale-free networks. As you can see by the wikipedia page on scale free networks, this means that the probability that a person picked at random has k friends is given by p(k)=ck^{-\gamma}.

If we put this into our above expression we get:

\frac{\sum_{k=1}^{\infty} c k^{2-\gamma} }{\sum_{k=1}^{\infty} c k^{1-\gamma}}.

The wikipedia page also helpfully tells us that for real world networks the value of \gamma is often between 2 and 3. This means that 2 - \gamma is between 0 and -1 and 1-\gamma is between -1 and -2. But the infinite sum of something to the power of a value between 0 and -1 diverges. This means that it heads off to infinity. The infinite sum of something to the power of a value between -1 and -2 converges. This means the value of the above expression is infinite!

But this means that Jeff’s random friend has infinite friends! This is clearly untrue since there are a finite number of people in the world…

What we have done above is assume that our social network follows a power law for every value of k and therefore assume that our network is infinite. In reality, power law behaviour is only observed for certain parts of the network. This means that our sums in the expression above do not go to infinity.

What we can say however is that if our network is very large then the above expression will also be very large provided some scale free behaviour can be shown. In this case, we can safely assume that, on average, the expected number of friends connected to Jeff’s randomly chosen friend is likely to be more than j (since j is expected to be <k>).