Extension Activity – Querying a Data Set
As well as using spreadsheets as a tool to manipulate data, for example, by splitting columns or combining data from different sheets into a single sheet, we can also use Google spreadsheets to run simple database like queries over the data set.
The School of Data course (An Introduction to SQL Databases for Analysing Data) gives a more complete introduction to making queries of data in a simple database using the SQL query language, but it’s worth noting that Google spreadsheets can also run queries written using a very similar language.
Queries are written using a query that is contained within a QUERY formula.
- Create a new sheet and in the first cell enter an empty query that simply pulls in the the data from the first sheet: =QUERY(‘Sheet2’!A:G)
- From within this cell range we can then start to run particular queries (refer to the School of Data course on simple SQL for more ideas). For example, we can just pull back data from specified columns:
=QUERY('Sheet2'!A:G, "SELECT A, C, D") - If we want to find a council in a particular region, we can search by region, for example: =QUERY(‘Sheet2’!A:G, “SELECT A, C, D WHERE A=’Kilimanjaro’”)
- We could refine the query even further, for example by finding councils in a particular region where the total population is greater than or equal to a particular amount:
=QUERY('Sheet2'!A:G, "SELECT A, C, D WHERE A='Tanga' AND D>=200000") - As the examples in the SQL course show, we can summarise data by group, such as counting how many councils are in each region:
=QUERY('Sheet2'!A:G, "SELECT A, COUNT(C) GROUP BY A") - We might also try to find the total population in a region:
=QUERY('Sheet2'!A:G, "SELECT A, SUM(D) GROUP BY A")However, when doing this sort of calculation, we need to check that a total amount for the region is not specified along with the values for each council – or we would be double counting!
For a full reference of query language commands, see the Google Visualisation API Query Language Reference documentation. For an introduction to writing SQL like queries in general, see the School of Data course An Introduction to SQL Databases for Analysing Data.
