Regular Expressions Basics

Marchete
153.5K views

Open Source Your Knowledge, Become a Contributor

Technology knowledge has to be shared and made accessible for free. Join the movement.

Create Content

Character Sets

Simple character set

In previous lessons, we learned that a regex made from literal characters, like ain, will search exactly those 3 letters in that exact order. It's essentially a search for a && i && n. But what if I need an || (OR) instead of an && (AND)?

You can accomplish this by using brackets [ ]. When you create a pattern like [ain], you'll search for a single character that must be either a OR i OR n.

In this first exercise, you'll need to create a pattern to match vowels.

Exercise 1 - Create a pattern to match vowels

Range character set

A simple character set can be bothersome to declare when you need to match the whole alphabet or all digits. For that reason, you can use - in Regular Expressions to declare ranges of consecutive characters. Using the pattern [a-z], you'll match any character from a to z (a,b,c,d,e....x,y or z). Likewise, [2-5] will match any number from 2 to 5. You can also combine several ranges inside the character set. [B-Ga-v] is a valid regex pattern. As stated before, regex patterns are case sensitive so [a-z] and [A-Z] match differently.

The ^ metacharacter is a special case. When used inside of [ ], the character creates a negative match. [^2-5] will match with any character except 2,3,4 and 5. Be catious as that doesn't mean it matches with the following numbers: 0,1,6,7,8 or 9. It matches with any other character, even letters and symbols.

Some regex engines (check your language first) support character set subtractions and intersections.

  • Subtractions are usually defined as [range-[subrange_to_remove]], like [0-9-[2-7]] indicating a set that matches only 0,1,8 or 9.
  • Intersections are defined as [range1&&range2]. The character must belong to both ranges to be matched in the search.

Note: Remember \w from the previous lesson? It's shorthand for [a-zA-Z0-9_]

Exercise 2 - Searching years from 2000 to 2199

For the next exercise, you need to create a complex pattern set with the following constraints:

  • Search for any consonant, search for a lowercase vowel, and search for either the letter 'n' or 's'.
Exercise 3 - Complex pattern set

Continue to the next lesson to learn about Repetitions in Regular Expressions.

Open Source Your Knowledge: become a Contributor and help others learn. Create New Content