Advanced Formulas
The spreadsheet we’ll be working on contains the education budgets for local councils in Tanzania (in Tanzanian shillings) and information from the census about the population living there. We want to bring the two sheets together to create a single source of information that will allow us to look at how much money is spent per-person in each region and then compare these numbers to see whether there are any noticeable regional differences.
If you inspect both of the sheets in the example spreadsheet file, you should notice that the names for the councils are presented in slightly different ways, although we shall presume that they refer to exactly the same areas. The budget sheet has a single column that contains both the name of a council as well as the region is it based in, such as “Arusha MC, Arusha”. The census sheet has two columns, one describing the region, and the other the council name. But the differences do not end there: the same location may also be given slightly different names: the census location “Arusha City”, for example, rather than the budget labelled “Arusha City”. (We need to take care when areas named in slightly different but we assume them to cover the same area: they may actually refer to slightly different regions on the ground. In an ideal world, data sets would include unique and unambiguous identifiers so that we know exactly what each data element refers to and that the data in one sheet referring to location XYZ123 is referring to the same place identified as XYZ123 in another sheet.)
Assuming that we can reliably and unambiguously map from the names in one sheet to the names in another, we need to work out a method that allows us to do this. We could do it by had, but it might take a long time – and be subject to all sorts of human error – if the dataset is large one. Instead, we’ll try to automate the process by using a combination of formulas.
If you’ve ever calculated the sum of values in a column using SUM() or found the average (mean, or median) values, you’ve used a formula. But did you know you can combine formulas, by feeding the result of one formula into another? We’ll do this in the next walkthrough.
Walkthrough: Advanced Formulas
- Within the example spreadsheet, compare the council names in the first and second sheets – try to spot any similarites and differences between them. Do any of the differences appear to be regular, or systematic?
- Notice how the budget sheet uses “MC” where the census sheet uses “City”, the same for “TC” and “Town”
- We could go through and change those manually, but let’s use a formula to do this.
- We’ll need to combine formulas in order to replace both “MC” and “TC” but first we’ll need a new column in the sheet
- Click on the small triangle next to the column letter and select “insert 1 right”

- This gives us a column to work in
- We’ll not give a title right now – we’ll name it when we’re done (since there will be many changes in it)
- Note: If combining formulas seems too confusing for you, we could achieve the same result by having multiple columns with intermediate results. For example, we might create a column AA that contains the output from one formula applied to another column (setting column AA to =THISFORMULA(C), perhaps) and then take the contents of that column and feed it into another formula in another column (such as column BB, setting it to +THATFORMULA(AA) for example).
- The formula we’re going to use is SUBSTITUTE – if you start typing ‘=SUBSTITUTE’ in the spreadsheet, you’ll see the help. three parameters are mandatory: the text we’re working on, what we’re searching and what we want to substitute it with. We’re searching for “MC” and want to substitute it with “City”
- So the formula we’ll be using looks like
=SUBSTITUTE(C2,"MC",City"):

What happens is here is the following: if in the cell C2, there is the word “MC”, replace it with the word “City”.
Note the order: =SUBSTITUTE(CELL,"FROM THIS","TO THAT"). (Some programming languages may refer to this as replace). If you try this out you’ll see how it works: nothing will change with cells like C2, where the word MC is not present, but two cells below, “Arusha MC, Arusha”, as found in cell C4, become Arusha City, Arusha.
- But we want to go further, because we also want to substitute “TC” with “Town”. We could keep creating new columns, passing the results created in one column into a formula in the next that works on those results. Or we can place one function inside another, passing the results from the “inner” one directly into the function that wraps it, as in this example:
=SUBSTITUTE(SUBSTITUTE(C2,"MC","City"),"TC","Town").

What happens here is the following:
if in the cell C2, there is the word “MC”, replace it with the word “City”, then if in the resulting text there is the word “TC”, replace it with the word Town. Or, using the formula format: SUBSTITUTE(DATA,"TC","Town") where DATA corresponds to SUBSTITUTE(C2,"MC","City")
- So this will do our town and city substitutions, but now we need to work out how to match the combined council and region names. There are two ways we can do this: either, in the census sheet, we could combine the two columns representing the council and a the region into a single column that replicates the formatting of the budget sheet; or in the budget sheet, we could split out the combined council/region names into two columns to match the formatting used in the census sheet. Generally, splitting data is more useful because it provides us with more information, or at least, access to the same information but in a more granular way.
- Luckily Google spreadsheets allows us to easily split columns (other spreadsheet programs allow this too, but tend to be more complicated). It offers a formula called SPLIT, which asks for a text and what to split the text by (in our case “,”)
- Once again, we could create a new column that applies this operation to a column that contains the results of our substitutions, or we can pass that data directly into it. Following that second approach, our formula now looks like this: =SPLIT(DATA,”,”). Which is to say, this:
=SPLIT(SUBSTITUTE(SUBSTITUTE(C2,"MC","City"),"TC","Town"),",").

- Now the magic moment has come to press Enter and see whether it does what we want.

- Looks good, let’s apply this to all rows by double clicking on the small blue square bottom right.

- Fantastic, we have now the council names in the same format as in the other sheet – and you learned how to stack formulas to do even more complicated things.
All you need to do now is to think of a way to appropriately name the columns.
