You are browsing the archive for Tom Longley.

New course: Exploring and understanding your data

- April 17, 2013 in Data Blog

table and contents, pivoted

Image: Table and contents, pivoted. By Tom Longley. Using a spreadsheet to pivot a table helps you see your data clearly and from different angles

We’re pleased to announce a new course: A gentle introduction to exploring and understanding your data. This is Tactical Tech’s second course for the School of Data, building on our earlier course about cleaning data.

We probably shouldn’t tell you this, but the course title is a bit misleading. This course is actually a boot-camp on how to use a powerful spreadsheet feature called the pivot table. If we called it “Pivot table bootcamp” or “Pivot tables made easy” nobody would click the link. This is because they’ve seen the pivot table feature in their spreadsheet and they’ve been gripped by fear.

The pivot table has all the hallmarks of something that is not intended for civilian use:

  • a name that suggests you’re about to do something dangerous: flick safety catch, break seal, pivot table, adopt brace position.
  • a tedious interface of blank white rectangles, unhelpful labels and seemingly endless combinations of drop-down menus.
  • the initially inexplicable effect that it has on the data that you process with it.

We understand this fear, friends, and we’ve written this course to help you overcome it. Despite having the human friendliness of a sleep-deprived tiger, pivot tables will save you considerable time and effort in understanding and processing your data:

  • They quickly summarise a complete dataset, so you can see at a glance what’s in it.
  • Using something called cross-tabulation pivot tables re-arrange a dataset without changing the original data.
  • You can make lots of them at once, allowing you to see lots of different views of your dataset at the same time.
  • All the other useful features of spreadsheets, like sorting, filtering and formulae also work on datasets presented in pivot tables.

In this course, we gradually work through the four steps it takes to build pivot tables that answer questions about your data. We illustrate each step with lots of simple and more extensive examples that you can recreate from downloadable sample datasets.

We hope you find this course useful. Please tweet or leave comments here or on the course page to let us know how you get on, and what we can make better in the course. Now… get started with a gentle introduction to exploring and understanding your data.

Flattr this!

Here, the spreadsheet is king (for now)

- January 28, 2013 in HowTo

Seat reserved… for the spreadsheet. Photo by Zoonabar. CC-BY-SA 2.0

For non-techie researchers and investigators like me who work on human rights, spreadsheets are incredibly useful. However, it’s hard to imagine a tool as flexible that is at the same time so deeply frustrating. Spreadsheets can make simple things very difficult. For example, for many years this is what “cleaning data” has meant to me and many other people I work with:

Open file in spreadsheet. Open cell. Position cursor. Correct error. Close cell. Move down a row. Open cell. Position cursor. Correct error. Close cell. Move down a row… repeat to row 53,234 or until you fall asleep at the keyboard (whichever comes first).

To help speed these sorts of tasks up, we’ve written a new School of Data course called A gentle introduction to cleaning data in a spreadsheet. It contains loads of ways to make cleaning data a quicker and less painful experience.

In the course we start with a ‘dirty’ dataset containing lots of common errors. We walk you step-by-step through the process of making it to ‘clean’. We’ll show you how to use a range of common spreadsheet features to find and correct problems such as invisible or inconsistent data, missing values, a bad data structure and so on. By the end of the course, you should leave with a better view of what the spreadsheet can do, a practical process you can repeat on your own datasets and a good idea of how to better find help online about how to use spreadsheets.

The course dataset is interesting too. It’s about ‘land-grabbing’, or the commercial buy-up of agricultural land in the developing world by investment companies and governments to grow biofuel and other commodities, turfing people off land they need for their survival and (some analysts reckon) driving up food prices around the world. The data was produced by GRAIN, an excellent research organisation; I hope they accept our apologies for picking on their data in this course!

This is the first in a series of three ‘basics’ courses. They all use the same dataset about landgrabbing. The next in the series is a course called A gentle introduction to descriptive data analysis, which is about using a spreadsheet to get to grips with what’s in your data. Hot on its heels will be an introduction to visualising networks.

Finally, this course will also illustrate the spreadsheet’s limits. At some point, the time and effort you make pushing a spreadsheet to do something may be better spent looking at tools and techniques specifically designed to tackle the problem. In the case of cleaning data, this might be learning how to use Google Refine.

But until that time, all hail the spreadsheet, king of data cleaning.

Flattr this!