What is Normal?
One of the next questions often is: what is “normal”? What we mean by this is: how can I tell whether something is worth looking at? (We’ll figure out what is special further below.)
There are different ways to find this out.
Mean
The mean (or “average”) is the most commonly used way of looking at what is “normal”. We know it from newspaper articles claiming that the “average income” is increasing or decreasing and so on. But how do we calculate it? Quite simple: we sum up all the numbers we have and divide the result by the number of numbers we have.
For example, the mean of 1, 2, 3, 4 = (1+2+3+4)/4 = 10/4 = 2.5.
Can you calculate the average of the heights we used before?
163.1 162.2 210.5 201.0 188.7 182.6 153.0 173.5 146.6 148.0
Answer: the mean is 172.92.
The mean is a great tool if your data is normally distributed. In that case, it tells you quite a bit about where the maximum of the distribution is and thus what you would perceive as normal.
Median
Let’s look at a different example. If we look at income distribution in countries, the distribution is not normal. It rather looks something like:

Now, if you look at the mean income, that might be quite a number. But if you earn less than the mean, you could still earn more than half of the population simply because the majority of the population earns so little. The median tells us this. To calculate the median, we simply sort the data we have and pick the value right in the middle.
Let’s calculate the median of the following data:
162.0 159.1 169.9 191.3 195.9 139.8 186.0
First we’ll sort the data (ascending or descending does not matter)
139.8 159.1 162.0 169.9 186.0 191.3 195.9
Then we’ll pick the value right in the middle: 169.9 – this is our median!
What if we have an even number of values? We simply take the mean of the two values right in the middle.
Can you calculate the median of:
163.1 162.2 210.5 201.0 188.7 182.6 153.0 173.5 146.6 148.0 ?
Result: 168.3, the average of 163.1 and 173.5
Mode
Sometimes neither the mean or the median really tell us what we want to know. Let’s look at a survey where we asked people how many siblings they have. They responded:
0, 1, 1, 1, 1, 2, 2, 2, 3, 5
well have a mean of 1.8 siblings, a median of 1.5 siblings. But what we really want to know is: How many siblings do the most people say they have. So we start counting.
|
1
2
3
4
5
|
0 - 11 - 42 - 33 - 15 - 1 |
We see 1 is the most frequent answer. This is the mode.
What do we do if the data is not discrete? We create bins as we did above, then count.
Sometimes you will not end up with a clear winner. There might be two different values that get the same number of counts. In case they are clearly separate, we call this a bi-modal distribution (or multi-modal in case it’s more than two).
