Some work
To do both the quick and longer tasks below you’ll need:
- a spreadsheet tool, such as Excel or Libre Office.
- a copy of the course dataset, which is GRAIN’s data on “land grabbing”.
A quick task (15 minutes)
Download a copy of the GRAIN database of land grabs onto your computer. Load the spreadsheet and consider the following:
- What sort of data type do you think the data in each column should be? Find out what type of data the spreadsheet thinks it is by right clicking on the column heading, selecting Format Cells and looking at the “Numbers tab”. What do you see? Try choosing different options from the Category and Format lists and see what happens to the data.
- Where there are some numbers in a column, can you add them up and see what happens? Use your common sense: does the sum look too big or too small? Does it produce an error? This may tell you there’s something wrong with the data.
- Where there is text or numbers, try sorting in ascending order, and see what happens. Does it behave as you would expect?
- Use Auto-Filter to display the distinct values in a column so you can see what sort of data is in the column. Does anything stand out to you as strange, or inconsistent? Can you see mixes of different data types?
After probing the data in these four different ways, what is your impression about how easy the data is to analyse using the spreadsheet? How could you improve the data?
Use the discussion area to let us know what you think.
A longer task (an hour)
From the Data Wrangling Handbook Recipe on Data Cleaning, run through “Problem 4: Fixing Numbers that aren’t numbers” using the GRAIN dataset. This step-by-step exercise explores the specific problem of how data on the size of investments in different land deals has been recorded in the dataset. It explains how you can detect and solve the problem using formulas, and use features like Pivot Tables to check your solutions are working. It will give you a strong idea of how to combine different spreadsheet functions and features.
