Working With Data in the Browser Using python – coLaboratory

August 20, 2014 in Data Blog

IPython notebooks are attracting a lot of interest in the world of data wrangling at the moment. With the pandas code library installed, you can quickly and easily get a data table loaded into the application and then work on it one analysis step at a time, checking your working at each step, keeping notes on where your analysis is taking you, and visualising your data as you need to.

If you’ve ever thought you’d like to give an IPython notebook a spin, there’s always been the problem of getting it up and running. This either means installing software on your own computer and working out how to get it running, finding a friendly web person to set up an IPython notebook server somewhere on the web that you can connect to, or signing up with a commercial provider. But now there’s another alternative – run it as a browser extension.

An exciting new project has found a way of packaging up all you need to run an IPython notebook, along with the pandas data wrangling library and the matplotlib charting tools inside an extension you can install into a Chrome browser. In addition, the extension saves notebook files to a Google Drive account – which means you can work on them collaboratively (in real time) with other people.

The project is called coLaboratory and you can find the extension here: coLaboratory Notebook Chrome Extension. It’s still in the early stages of development, but it’s worth giving a spin…

Once you’ve downloaded the extension, you need to run it. I found that Google had stolen a bit more access to my mac by adding a Chrome App Launcher to my dock (I don’t remember giving it permission to) but launching the extension from there is easier than hunting for the extension menu (such is the way Google works: you give it more permissions over your stuff , and it makes you think it’s made life easier for you…).

When you do launch the app, you’ll need to give the app permission to work with your Google Drive account. (You may notice that this application is built around you opening yourself up to Google…)

Once you’ve done that, you can create a new IPython notebook file (which has an .ipynb file suffix) or hunt around your Google Drive for one.

CoLaboratory_Notebook

If you want to try out your own notebook, I’ve shared an example here that you can download, add to your own Google Drive, and then open in the coLaboratory extension.

Here are some choice moments from it…

The notebooks allow us to blend text (written using markdown – so you can embed images from the web if you want to! – raw programme code and the output of executing fragments of programme code. Here’s an example of entering some text…

coLaboratory_Notebook_text

(Note – changing the notebook name didn’t seem to work for me – the change didn’t appear in my Google Drive account, the file just retained it’s original “Untitled” name:-(

We can also add executable python code:

coLaboratory_Notebook_code

pandas is capable of importing data from a wide variety of filetypes, either in a local file directory or from a URL. It also has built in support for making requests from the World Bank indicators data API. For example, we can search for particular indicators:

coLaboratory_Notebook_wb

Or we can download indicator data for a range of countries and years:

coLaboratory_Notebook_wb_data

We can also generate a visualisation of the data within the notebook inside the browser using the matplotlib library:

coLaboratory_Notebook_plot

And if that’s not enough, pandas support for reshaping data so that you can get it into a from what the plotting tools can do even more work for you means that once you learn a few tricks (or make use of the tricks that others have discovered), you can really start putting your data to work… and the World Bank’s, and etc etc!

coLaboratory_Notebook_reshape

Wow!

The coLaboratory extension is a very exciting new initiative, though the requirement to engage with so many Google services may not be to everyone’s taste. We’re excited to hear about what you think of it – and whether we should start working on a set of School Of Data IPython Notebook tutorials…

Flattr this!