How to use RegEx formulas in Google Sheets
Handling big sets of data to analyse is quite normal if you work in SEO. On a regular basis, either in Google Sheets or Excel, I use formulas such as vlookup, index, iferror, and the list goes on, to sort quickly through endless data. Not only are they fundamental for my survival as an SEO analyst, But sometimes, these functions are just not good enough. They’ll do the work, but the time it takes to set them up makes me question whether I am actually working faster.
Thanks to a recent outstanding presentation delivered by Robin Lord during a digital marketing Meetup event at Brainlabs, I was introduced to the world of Regex. Regex is a matching expression tool returning specific values, for example, true or false, whether it finds the expression or not. It can be used on many platforms, such as Google Analytics, Python, Java, and more, to perform searches of specific strings.
You may be thinking “oh no, another complex programming language I don’t understand that complicates things even more”. And you could not be more wrong. The good news is that you can use Regex on Google Sheets to work with your data, making your life beyond easy. That’s right, I’m all about learning tools to make my life easier and lately Regex is my favourite.
The 3 main Regex formulas you can use on Google Sheets are:
- REGEXEXTRACT
- REGEXREPLACE
- REGEXMATCH
They perform exactly what they say: extract, replace, and match. Since my purpose here is to demonstrate how helpful these functions are, I won’t go too much in deep into all the Regex syntax rules. There are already plenty of comprehensive guides and resources out there so if you want to learn more Regex, here is a useful Regex guide and this is a really good debugging tool.
Below I’m going to dive right into Regex formulas for Google Sheets and examples on how to use them and what you can achieve with each one.
REGEXEXTRACT
This is the function you’d want to use when you are digging out a certain string within a longer one. For one of my clients, I had a list of URLs and I had to extract certain words within each. For example, from the list of URLs below let’s suppose I had to extract the location:
- https://www.regexisgreat.co.uk/city/in-london
- https://www.regexisgreat.co.uk/location/in-paris
- https://www.regexisgreat.co.uk/cities/in-dublin
- https://www.regexisgreat.co.uk/europe/in-amsterdam
However, I had many words I was looking for, and each URL could have included any of them and in a different location within the URL. Without Regexextract, I’d have to apply multiple filters and custom formula for each URL, extracting the word based on its position. I don’t have time for this and nor should you.
Let me get you started with a really simple Regex example. Let’s say I needed to extract the word “great” from the link “http://regexisgreat.com”. When you type in “=REGEXEXTRACT” in Google Sheets this is what the function requires:
The “text” part is the cell where you have something to look for, and the “regular_expression” is the expression telling Regex what to look for. In our case, the formula would become:
=REGEXEXTRACT(A3,”.*(great).*”)
Here is a little syntax context: The .* before and after the word “great” tells Regex not to worry what precedes or follows the string “great”. So “great” could be anywhere in the URL, Regex will find it:
This is a Regex example of what the formula would become if you are looking for multiple values:
=REGEXEXTRACT(A3,”.*(blue|green|red|purple|blonde).*”)
Here you probably noticed the use of pipe “|” which stands for “or”. The formula tells Regex to look for the word blue or green or red and so on.
Again, it doesn’t matter where the word is within the string. One additional thing to note is that Regex differentiates for capitalised letters, which can be quite handy sometimes.
The applications of this formula are infinite when you think that you can combine it with the functions you use every day. Even without combining REGEXEXTRACT with any other formulas, performing the task I show above without it would be quite time-consuming.
REGEXREPLACE
For one of my clients, I had a long list of title tags and I wanted to capitalise the first letter of each word except for words like “the” “in” “for” “a”, you get the point. With the formula PROPER you can capitalise each first letter of a word:
But this didn’t solve my problem. Regexreplace came to the rescue:
=regexreplace(proper(A4),”The”,”the”)
Similarly to Regexextract, the “text” part is the cell you want to modify, and the remaining part is a simple instruction telling Regex what to replace:
Here is an example of Regexreplace combined with itself:
=regexreplace(regexreplace(proper(A4),”The”,”the”),”In”,”in”)
REGEXMATCH
Regexmatch searches for a value in a cell and returns a TRUE or FALSE. For me, this became useful when I was checking whether the URLs in my list contained certain words and I didn’t want to go through endless filters.
Here is a simple Regexmatch example, looking for the word “blue” within a URL:
=regexmatch(proper(A3),”.*blue.*”)
Here is an example showing how Regex differentiates for capitalised letters:
=regexmatch(proper(A3),”.*Blue.*”)
Again, here you can use multiple values to look for within the same formulas:
=regexmatch(proper(A3),”.*Blue|blue.*”)
The applications for Regexmatch are many and you can combine it with any other function on Google Sheets.
As you may have noticed, the potential of these three simple functions are endless. Let’s not forget that there are more of syntax rules on Regex that you can use to achieve different things. Here is an additional debugging tool for those of you who want to get into coding with Regex.
Regex has made my life a lot easier and my data a lot more interesting to work with. I’m sure you’ll find it extremely useful and surprisingly easy to learn. Feel free to comment if you have any questions about the formulas or further suggestions. Keep calm and REGEX on!