Persecuted by a number

67“My problem is that I’ve been persecuted by an integer”

With this, George Miller, like a Franz Kafka of cognitive psychology, launched one of the most influential papers in the field. Miller’s persecutor was the number seven – which, in test after test of absolute judgement, appeared as approximately the number of categories people could tell apart.

What a category was, or even how approximately “approximately” was, are where things get interesting. And in trying to answer these questions, Miller drew in the young field of Information Theory to his world. I came across him in visual design – where the “seven things” paradigm is used to argue for simplifying visual representations – for example, reducing a colour scale of continuous data by chunking it into discrete categories.

So let’s start with the experiments he was talking about. They were experiments on absolute judgement – where the subject was required to identify a stimulus as being in a specific category. An example of this would be where an experimenter played a musical note of one of five pitches (for example); the subject would have to decide which of the five pitches it was. Or a note of a certain loudness – again, a discrete number of volume levels would be chosen. Of course, five categories is an arbitrary number; you could choose four or six or seven or one hundred. Now, I’m pretty sure I can identify 8 notes in the octave, but not 100. So, as you might expect, as  you increase the number of categories, people’s judgement gets worse and they make more mistakes. The idea is that you can extrapolate from this and say what the maximum number of categories should be if you want people to perform reliably. (For the experiment on pitch, the answer was about six – which I find a bit weird, given my previous comments, but there’s no reason to think that these notes were organised as a musical scale, which might have given the listener some advantage. Or maybe the blues has it right and E minor pentatonic has all notes you’ll ever need. But I digress.)

Miller talks about a whole series of perceptual tasks that come under a similar umbrella – categorising shapes, sounds, colours, or the positions of dots on a screen – and finds similar rules for the number of categories that people can differentiate: seven, plus or minus two. But while it’s the title of the paper (The magical number seven, plus or minus two), in many ways it’s the least convincing element. More innovative was his use of information theory to characterise this aspect of cognition.

Information Theory was spearheaded by Claude Shannon who famously codified the idea that any message (like a radio signal, or a morse code telegraph) has a certain amount of information that is being imparted. For example, if you’re waiting for a yes/no answer to a question, this can be transmitted digitally by a 0 (no) or a 1 (yes) – the message is said to contain one bit** of information. If I want to tell you which tyre on my car has burst, I need a number between 1 and 4. In binary transmission, this is two bits. You can, in principle, do this with any message – roughly speaking, how much you knew afterwards compared to how much you knew before tells you how much information there is in the message.

Miller was unusual in applying this to these tasks of perception. He argued that while short-term memory was limited to storing seven chunks (seven numbers or seven objects or seven names…), absolute judgment depended on information – typically about 2.5 bits, or seven categories. How did he decide that this was true? Well, you could test short term memory with objects which contain multiple bits of information ‌each. So remembering a sequence of letters like A, F, Z,… is typically no more difficult than remembering a sequence of single-digit numbers like (for example) 7, 7, 1,… even though the letters represent a lot more information –  being drawn from a pool of 26 and the numbers from a pool of 10. Miller’s point was that memory chunks data so we can store more, but perception is limited by the amount of information presented to the recipient.

In using Shannon’s measure of information, Miller took the logarithm of the number of categories – and in doing so, I think, convinced himself of his seven things thesis. The difference between 4 and 16 categories seems pretty big to me, but the difference between 2 and 4 bits of information seems a lot smaller – it’s certainly easier to convince yourself that there’s some magic at work. Secondly, people learn – this capability to categorise data is a lot higher for someone trained in a task than a novice. There also seems to be plenty of evidence that people find it harder to recall a series of longer words, both in terms of information (number of letters, say) and it terms of how long the words take to say – so the “seven chunks in short-term memory” doesn’t quite hold, either.

I started reading Miller’s paper expecting a quick takeaway – people can only tell 5 things apart, they can only remember 7 things – but neither are really true, both vary massively with training, task, and complexity of object. So why is this paper still so influential? Well, for a start, it’s really well written. I recommend giving it a whirl. The description of Shannon Entropy is on of the more accessible around. And, while its results have been superseded, Miller’s concept of chunking, his distinction between bits of information (in the Shannon sense)  and chunks (categories or memories) helped to kickstart a whole strand of research that went on to make his observations obsolete. Which, if we go by our most Popperian ideas of scientific creativity, is the greatest thing a scientist can ever ask for – building something beautiful enough that it gives others the tools to smash it up.

—–

Further reading:

Miller, G.A., 1956. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological review, 63(2), p.81.

Baddeley, A., 1994. The Magical Number Seven: Still Magic After All These Years? Psychological Review April 1994, 101(2), pp.353–356.

—–

*I will come clean – while I’m familiar with information entropy, I have not read this paper. I find it heavy going. Maybe I will revisit it one day.

** a bit as in a computer bit, not as in a little bit