How big is the variation in the data?

The next thing we want to know is how big the variation is in our data. Two measures come in handy: the standard deviation and the median absolute deviation. The standard deviation comes with the mean and is very frequently used. The median absolute deviation is less well known and would be the best to use if you’re using the median already.

Standard Deviation

So let’s look at the first thing, the standard deviation. It tells us how much, on average, data points are off the mean. We calculate it by summing up the square of the differences of the values and the mean, then dividing that sum by the number of measurements minus one, then taking the square root of that—did you even pay attention?

More importantly: If we do have a normal distribution – 68.27 percent of data points will fall within one standard deviation from the mean and 95.45 percent within 2 standard deviations from the mean. So it gives us a good idea where most of our data is. Hard to remember- this illustration shows it pretty clearly:

Let’s take our data above: 1, 2, 3, 4.

We already know the mean is 2.5.

Let’s calculate the standard deviation:

Value	Difference to mean	Squared Difference
1	-1.5	2.25
2	-0.5	0.25
3	0.5	0.25
4	1.5	2.25

So we’ll sum up the squared differences: that’s 5.

We’ll divide 5 by our number of data points minus 1.

5/(4-1)

That’s 5/3.

And now we’ll take the square root of that.

And we arrive at 1.291. – this means that 68.27% of measures will fall in this distance from the mean (assuming we do have a normal distribution).

Sounds complicated! It is a little. But remember, it’s just adding stuff up, multiplying, and dividing. No big magic here. Luckily, if you need it, spreadsheets have a formula for this: =STDEV.

Median Absolute Deviation

As we said, the standard deviation works well when you can use the mean, since it’s based on the mean. But what about when you use the median? Use the median absolute deviation. It works very similarly, but it’s
easier: you calculate your median and the absolute difference between each value and the median. Then you calculate the median of the differences.

For example: our data is

1, 2, 3, 4, 5

The median is 3.

The differences are: 2 1 0 1 2 – sorted, 0 1 1 2 2.

The median absolute deviation is: 1.

Sounds feasible, right?