e

Like what you see? Let's talk about how we can help your business. Contact Us -->

Regular Expressions – Don’t Use Google Analytics Without Them

Regular Expressions – Don’t Use Google Analytics Without Them

Regex NinjaI have to admit: I’m a recovering regexaphobe. When I was new to analytics, I remember someone sending me a snippet of regular expressions (AKA regex) to solve a goal setup conniption I was having. It looked like a foreign language to me. I was fascinated by it but repelled at the same time.

Sadly, my intimidation of regex prevented me from doing more powerful analysis. I tried everything to avoid it and would copy and paste code from articles I saved in my Delicious when I had to create a custom filter. But eventually I hit a wall I couldn’t scale unless I conquered this beast, and I set out on a quest to learn it. But I resolved to only learn enough regex to help me as an analyst. No propeller for me, thankeww.

As unsexy as regex is, I’m writing this post because if you don’t know the basics, you too will be limited in your ability to create segments, goals, and filters in Google Analytics — or whatever Web-based analytics platform you’re using. So I’m going to hit on the main ones you’ll need, without the use of geek speak. I will even subject myself to public scorn by my awesome programmer friends by sharing the goofy mnemonic devices I used early on to remember a few of them I just couldn’t seem to get down.

For ease of scanning, I’m also breaking my regex characters up into leagues to signify which ones I use most, occasionally and seldom to never.

Major League

Pipe (|)

The pipe character (|) is the regex equivalent to or. So let’s say you want to find out how many conversions you received from Google, Bing, or Yahoo, you could set up a segment that looks like this:

Regular Expressions for Google Analytics

Tip: Remember to change the Condition field to Matches regular expression if you use regex to create a segment.

Another example of when I use the | character is when I’m creating a goal, and a step in the goal funnel or conversion can include more than one page:

Regular Expressions for Google Analytics - Pipe

Dot (.)

The . is a wildcard character. It means match any one character. It can be a number, letter, or special character (even a white space). By itself, it’s not that amazing, but with the help of the next playa, the asterisk (*), it’s all kindsa bad to the bone.

Asterisk (*)

This is the MVP of all regex characters, in my opinion. It says to match 0 or more of the characters before it. So, in other words, it looks at the character before it (most often the . character) and says there may or may not be that character and an unlimited number of matches afterwards.

To be honest, the Advanced Segments area was made so that you could easily go without ever using regex to create segments. It may take you longer — like if you use the Or operator to include all of the different sites that you want to include in your social media segment — but you can get away with it. Between and/or operators and the ability to choose options like Contains or Starts with from the Condition field, you can oftentimes avoid using .*, so I’ll use a more advanced example of how I use these wonder twins.

We have several clients who use subdomains. By default, Google Analytics only shows the URI (the part of the URL after the domain). The problem with that is it clumps all of the site’s pages into one repository, and you can’t easily see which pages are from which subdomains. So I created the following filter that combines the Hostname (domain) and the Request URI (URI) and replaces the standard URI with the full URL. Here, the .* means the Hostname and URI can use any characters.

Regular Expressions for Google Analytics - Asterick

Backslash (\)

This character escapes out the following character. In plain English that simply means that it says treat the following character as a regular ol’ character and NOT a regex character. So if I write out index\.aspx\?query=funky\+boots (shout out to Michelle Robbins) I’m saying treat the . , ?, and + signs as characters and don’t interpret them as regex. (You’ll learn about the ? and + characters soon.)

Minor League

Caret (^)

This simply means your selection has to begin with whatever you put after it. I use this both in segments and goals. Let’s say I want to look at just the landing pages in one directory of my website.  I would use something like this:

Regular Expressions for Google Analytics - Caret

I’m only putting this character in the minor leagues because you could choose Starts with from the Condition drop-down menu when creating a segment. But Google doesn’t offer you that option elsewhere.

Dollar Sign ($)

This regex character means that your string ends at that point. For example, health insurance$ matches cheap health insurance but not health insurance rates. Or you could attach a $ to the end of a URL to prevent that URL with any query strings from being included in your match. Or at the end of a directory to analyze only traffic to your category page and not its subpages.

Now here’s a little mnemonic device I, a non-propeller head, came up with when I first started learning regex, but you have to promise not to laugh.

Promise?

Okay, I thought of how you lead someone with a carrot (I know it’s a different spelling — work with me) by putting it out in front and how at the end of the day it’s all about the money.  So the ^ goes in front in a regex expression and the $ at the end. Go ahead and laugh (promise breaker), but I guarantee you’ll remember next time.

Question Mark (?)

Technically, this character means 0 or 1 of the character before, but I like to think of it as the previous character being optional. Maybe it’s there, maybe it’s not — who knows, really? Hence the ?. See how easy this is when you’re not learning from a text book printed on recycled paper with a monospaced font?

Okay, so let’s say you want to see keywords that include dining room, but some of your searchers passed notes all through third grade and never learned that doubling up the consonants before –ing makes the vowel short. So how do you include these misspellings? You could use the ? this way:

Regular Expressions for Google Analytics - Question Mark

It would return keywords that match dining room and dinning room.

Parentheses ( )

Parentheses are used to form groups — just like you learned in algebra. I really don’t use these often in creating garden-variety segments or goals. I use these more when I’m creating rewrite filters. Why would I do that? Because I’m in desperate need of a hobby. But besides that, I use them for sites that, for whatever reason, can’t (or won’t) rewrite their nasty dynamic URLs. It’s very difficult to interpret landing page reports that consist of dynamic URLs. So I give them prettier, more intuitive names. (Hmm … Sounds like another post for another day.)

For one client’s site, I wanted to create a bucket for all the URLs that were generated when someone searched for a property on their site. Believe it or not, this was the regex I had to write to create a net big enough to scoop up all of those pages:

(^/index\.html\?pclass.*)|(/index.html\?action=search.*)|(/index\.php\?cur_page=.*)|(/index\.html\?searchtext.*)|(realty/index\.html\?pclass.*)

We’ll get to what all of these regex characters mean, but each group in parentheses was a different version of the resulting search listings pages, depending on where you initiated your search. Ugly, huh? I mean, the regex I wrote was beautiful; it was the code that necessitated this regex that should be sent to bed without dinner.

Another example would be Sep(tember)? would match Sep or September. Or if you wanna get all crazy with it, (S|s)ep(tember)? would match sep, Sep, September, and september. But now I’m just showing off. Sorry.

City League

Square Brackets ([ ])

This means match any one of the characters between the brackets. So, c[aou]p would match cap, cop, and cup. But you can only pick one; that’s the key to the brackets. You can throw in a dash to indicate a range of characters to choose from. For example, [0-5] would mean you could pick any one digit between 0 and 5. I have used these when filtering out IP addresses for larger companies that have a span of IPs. So the IP might look something like this:

Regular Expressions for Google Analytics - Brackets

This would cover a range of IPs where the last octet spans from 130 to 138.

Plus Sign (+)

To be honest, I never use this character. Actually, I think I used it once just to get the t-shirt. But it means one or more of the previous character. So it’s a lot like the asterisk, except it requires that at least one character matches. It’s a diva.

Curly Braces ({ })

Again, I rarely use these in Google Analytics — usually only with really tricky URL rewrites. But curly braces indicate how many times you may want a character repeated. For simplicity’s sake, I’ll explain how to use it with an example that you probably wouldn’t use in your analytics but would make more sense. (Life is all about compromises.) Let’s say you want to indicate a number that is a US-based five-digit zip code. You would write it as [0-9]{5} because there are five digits in a US zip code.

You could also express a range with curly braces by using the convention {minimum, maximum}. For example, let’s say you have a list of product IDs that start with three lower case letters followed by a hyphen and then three-to-five digits. You could indicate them this way:

[a-z]{3}-[0-9]{3,5}

Testing Your Regex

The best part of Google Analytics is every report comes with a filter at the bottom. And that filter is sensitive to regex. I tried several different regex testers before discovering this is the best regex testing ground when creating regex specific to Google Analytics.

Regular Expression for Google Analytics - Report Filter

How would you leverage it? Just go to the report that contains the items you’re writing the regex for: the Keyword report if you’re trying to concatenate keywords, Traffic Sources if you’re trying to identify specific sources, etc.

So if I’m writing regex to capture a group of pages to concatenate in a segment to analyze, I’ll go to the Top Content report and paste my regex into the filter. If all of my pages are present and accounted for, I’m golden. It’s a real time saver.

If you want to learn more about using regex, I cut my teeth on LunaMetric’s Regular Expressions for Google Analytics guide (PDF). And Robbin Steif personally answered questions I had about the quiz at the end. That was impressive.

So your turn: How do you use regex with your analytics? Any tips you’ve learned in the trenches? Let us know here or connect with us on Facebook or Twitter. Also, anything you want to learn more about with Google Analytics? Let me know below or on Twitter, my cyber home away from home.

UPDATE: See eight practical examples of regex in Google Analytics.

Want to Get Inside?

Become a BlueGlass Insider Today!

  • Be the first to know about BlueGlass events, meetups, and surprise releases. Before they’re made public…
  • Exclusive access to the latest tools, tips and must-read posts.From people who have been doing this for years…
  • Insider perspective on the latest trends in digital marketing. Info that you won’t get anywhere else…

Enter your email below to join for free!




Comments

  1. Most useful post I’ve read in a long time. I dub thee RegEx Royalty.

    • Annie Cushing says:

      Why thank you, Andrew! I will wear that title with pride. :)

    • Absolutely agree with Andrew. Awesomely useful post!

      • Annie Cushing says:

        Thanks, James. Glad to hear that. :)

  2. Nice article!

    Isn’t the spelling ‘Asterisk’ not ‘Asterick’ for the MVP?

    Michael

    • Annie Cushing says:

      Ohh is that what that little red squiggly line I clearly overlooked meant? head –> desk

      Thanks. It’s fixed.

  3. Well done! You are at the edge of that slippery slope to getting a propeller!

    Having been forced into my so-adorned cap too many times, I’ll share one with you so you and your readers don’t trip over it by accident: The caret ^ means something different when INSIDE square brackets.
    [^0-9] means NOT in the range of 0-9. That can actually be really useful at times…

    • Annie Cushing says:

      Heyy, Mike! Good to see you here!

      I actually considered explaining the use of the caret inside square brackets, as well as \d, \w, and \s, but I’ve only used needed to use the [^ ] convention once and was afraid of overwhelming newbies. If I blogged more frequently I would have broken it up.

      I can only hope to grow into that propeller you’re sporting one day. :)

  4. Excellent writeup, really brings a simpler eye into regex for those of us who share(d) the phobia. Thanks for such a clear discussion!

    • Annie Cushing says:

      That’s what I was aiming for – to distill it down to the basics. Once you get past the intimidation and learn the essentials, it’s really easy to transition into more complicated applications.

  5. David says:

    Nice roundup. One question: Shouldn’t it be the “following character” versus “previous character” under the description for “Backslash”?

  6. Anna says:

    Hi Annie,

    Great article, thanks very much! Quick question though, in the Asterisk section, when setting it to show the sub domains – will this work if a sub domain is secure but others aren’t? ie https://subdomain.site.com and http://www.site.com.

    Thanks.

    Anna.

  7. Have you thought about teaching? Your style rocks, plus some valuable insights. Time to go make pretty regex and hit a home run.

    Thanks, Annielytics!

    • Annie Cushing says:

      Thanks, Dana! I actually used to teach high school. I really loved it. My students called me The Cushinator. I guess that gravitational pull to teach never quite leaves you. :)

      Now let’s dress up our regex up and take it out for a night on the town!

  8. Rob Hammond says:

    Nice post! More people should get into building up their understanding of regexes as they’re essential for GA, and useful for so much more. O’Reilly have a couple of great books on the subject I’d recommend (Regex Cookbook & Mastering Regexes).

    Btw on the ‘+’ explanation – “so it’s a lot like the dot (.)” do you mean asterisk? :)

    • Annie Cushing says:

      Yes, that’s exactly what I meant. Thanks. :)

  9. Grant Miller says:

    this is possibly one of the most useful articles i’ve ever read… I’ve tried to pick up the basics of regex more than once and always failed. your post is such a great, straight forward explanation of the basics that I’m already running at full steam with this info… thanks so much!

    • Annie Cushing says:

      That is music to my ears, Grant! I’m so glad it helped you. :)

      • Grant says:

        absolutely, check out this really cool regex visualization tool i found through HN: http://strfriend.com/

        • Annie Cushing says:

          Hey, that is a cool tool. Bookmarked for future reference. Thanks.

  10. Recently I decided I really, Really needed to get to the next level with GA. So, I simultaneously ordered the book Advanced Metrics With GA and committed myself to passing the Google GA exam. Running into regex, I gulped hard and steadied myself for getting badly injured by a propeller cap I’m not worthy of wearing, and an uphill I climb I wasn’t sure I’d be able to survive. I’m still climbing and studying for the exam, and reading that you are a recovering regexaphobe, and your humorous-yet-highly-informative post about it using it, has renewed my hope that perhaps I too will someday wield the power of regex withing GA.

    Thanks!
    David

    • You can absolutely wield the power of regex in GA, David. Once you start using it, you’ll get the hang of it quickly. Good luck with the exam!