DDJSchool Tutorial: Analysing Datasets with Tableau Public
This tutorial is written by Gregor Aisch, visualization architect and interactive news developer, based on his workshop, Data visualisation, maps and timelines on a shoestring. The workshop is part of the School of Data Journalism 2013 at the International Journalism Festival.
- Download and install Tableau Public. By now there is only a Windows version available.
- Download the dataset eurostat-youth.csv
Loading a CSV file
- Click Open data to open the data import window. From the list on the left pick Text File and select eurostat-youth.csv. Make sure that Field separator is set to "Comma". Click OK to proceed.
- Note: if this step fails with an error message, try changing your system region to English in Windows control panel (see screenshot). It seems that Tableau has cannot comma-separated values if comma is set as decimal separator for numbers in the system settings.
- Tableau now lists all the columns of the table in the data panel on the left. The columns are classified into Dimensions and Measures.
- The dataset contains the following columns (all data is 2011 and aggregated on NUTS-2 level):
- secondary_edu: percentage of population with secondary education
- youth_unemployed: percentage of people that aged between 18 and 24, unemployed and do not participate in education or training.
- unemployed_15_24M: percentage of unemployed males between 15 and 24.
- unemployed_15_24F: percentage of unemployed females between 15 and 24.
Analysing a dataset
Now we are going to analyze the dataset using Tableau.
- Now drag the field youth_unemployed from Measures to Columns. Then drag secondary_edu to Rows.
- As you see Tableau computes the sums of the columns instead of plotting the individual values. To fix this we need to right-click the green fields and select Dimension.
- If both fields are set to be treated as dimensions you should see a scatterplot like shown in the following screenshot. You can see that there is a negative correlation between education and youth unemployment.
- Now drag the field country from Dimensions to Color to color the plot symbols by country. You can also drag the country to Shape to change the icon.
- Add the fields country and geo_name to the Detail mark to include that piece of information in the tooltips.
- Now you can use the color legend and quick filters to highlight and hide certain countries.
- Focus on Turkey
- Plotting unemployment by gender
Bonus: creating a map with Tableau
- Now we can create a map easily: select the dimension lat and lon together with the measure count (while holding the Ctrl key) and click on Show Me to expand the list of suggested visualizations. Then click on the icon of the map with the blue circles. Click on Show Me again to hide the panel.
- Now you should already see the complete table. Tableau is smart enough to use the square roots of the counts for the circles radii automatically, so we don't have to care about this.
- You can make the circles transparent by clicking on Color in the panel Marks and moving the transparency slider. To change the size of the circles, click on Size and adjust the slider.
- Now drag the field name to the Mark Label to add the city names as labels to the map.
Note: The final steps of this tutorial are going to be added in the coming days.
Enjoyed this? Want to stay in touch? Join the School of Data Announce Mailing List for updates on more training activities from the School of Data or the Data Driven Journalism list for discussions and news from the world of Data Journalism.