Distribution

Next you want to look at is how the data is distributed. This is commonly done with a plot called a histogram. A histogram simply counts how often each value appears and proceeds from there.

So how do we do this? This is commonly done by binning data. What does this mean? Basically, we create bins, which are the ranges of numbers we care about.

Let’s do this for the data we used above. Our data ranges from 146.6 to 210.5—let’s create reasonable bins. Let’s say we use 140-160, 160-180, 180-200, and 200-210. Then we go on and—surprise—count. How many values do we have between 140 and 160? How many between 160-180? And so on.

Result:

Bin	Number
140-160	3
160-180	3
180-200	2
200-220	2

Fantastic – let’s draw this as a simple graphic:

140-160: ***
160-180: ***
180-200: **
200-220: **

This is all there is to histograms, which show us how our data is distributed.

While this doesn’t tell us much if we have only ten values, this is super useful if we have more.

Things we want to look at: how many peaks do we see? Is there a single or multiple peaks? (Multiple peaks can tell us something about different groups present.) Is this a normal distribution where there is a clear peak and the sides are equally distributed (as below)?