@khjrtbrg

Skip to main content

Regex Crash Course

On Friday, Nell Shamrell guest lectured on Regular Expressions at Ada. She described using regular expressions as finding the block shapes that fit in particular block shaped holes. While I like that metaphor, I like to think of regular expressions as a cousin of the kinds of basic ciphers and secret codes kids come up with when pretending to be spies (maybe it was just me, but Harriet the Spy was popular during a pivotal part of my childhood).

A regular expression uses a system of shorthand symbols to stand in for something else. If you know what those symbols mean, you can use a regular expression to find your hidden string, much in the same way that you might use a cipher to decode a super secret spy code.

Real World Application:

Regular expressions are useful for making your code shorter and more precise. Rather than using a long series of if statements and .include? to see if a string might be a match for whatever you're looking for, you can just use a regular expression, cutting your code down to a short (although not immediately human-readable) line of code.

Among Nell's suggestions: add a link to your regular expression on Rubular in your comments to be less cryptic about what a regular expression does.

Format:

Regular expressions always come wrapped in forward slashes, like so:

/foo/

That regular expression means: "When you use me, find something that matches the letters f, o, and o, all in a row, all lower case."

Regular expressions can be a lot more complex, though. Say you wanted to find a more general string of letters, for instance any three letters in a row, case-insensitive? That's where regular expressions' shorthand symbols come in handy.

Quick Example:

Let's say I want a regular expression that will match both "Bee" and "foo". Starting from the beginning, I'll get my regular expression ready:

//

Exciting, no? Now, to find any word character, I can just use the shorthand symbol for that, "\w":

/\w/

To match three of those in a row, I'll add "{3}" directly next to it:

/\w{3}/

That regular expression will match both "Bee" and "foo", but it will also match "Bees". To make it match only one word, I can add "\b" to the end to indicate a word boundary:

/\w{3}\b/

Now that will match "Bee" and "foo", but it won't match "Bees". What it will still match, however, is "123" and "_k9". To limit it to only match letters (and not numbers or underscores), I'll just swap in "[a-z]" for "\w", and tack an "i" at the very end outside of the forward slash to make it case-insensitive:

/[a-z]{3}\b/i

Extra Useful Symbols:

Type

\w Any letter, number, or underscore
\d Any digit
\s Any whitespace

Capitalizing means the opposite: for instance, \S means any non-whitespace.

Quantity

? Zero to one of the previous symbol
* Zero to any number of the previous symbol
+ One to any number of the previous symbol
{2} Two of the previous symbol
{2, } Two or more of the previous symbol
{2, 4} Two to four of the previous symbol

Position

\b Word boundary
\< Start of a word
\> End of a word
\A Start of a string
\z End of a string
^ Start of a line
$ End of a line
\z One to any number of the previous symbol

Helpful Regular Expressions Cheatsheets and Resources:

Regular expressions can do a lot more than the above examples. If you're interested in reading up, here are some more resources: