Regular Expressions Basics

Marchete
9,190 views

Open Source Your Knowledge, Become a Contributor

Technology knowledge has to be shared and made accessible for free. Join the movement.

Create Content

Character Sets

Simple character set

In previous lessons, we learned that a regex made from literal characters, like ain, will search exactly those 3 letters in that exact order. It's essentially a search for a && i && n. But what if I need an || (OR) instead of an && (AND)?

You can accomplish this by using brackets [ ]. When you create a pattern like [ain], you'll search for a single character that must be either a OR i OR n.

In this first exercise, you'll need to create a pattern to match vowels.

Exercise 1 - Create a pattern to match vowels
1
2
3
4
5
6
7
namespace RegexCourse{
public static class Exercise1{
//Write a regex pattern to match any vowel, both lowercase and uppercase
//"Y" and "y" aren't considered vowels in this exercise
public static string Pattern_MatchVowels=@"";
}
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Range character set

A simple character set can be bothersome to declare when you need to match the whole alphabet or all digits. For that reason, you can use - in Regular Expressions to declare ranges of consecutive characters. Using the pattern [a-z], you'll match any character from a to z (a,b,c,d,e....x,y or z). Likewise, [2-5] will match any number from 2 to 5. You can also combine several ranges inside the character set. [B-Ga-v] is a valid regex pattern. As stated before, regex patterns are case sensitive so [a-z] and [A-Z] match differently.

The ^ metacharacter is a special case. When used inside of [ ], the character creates a negative match. [^2-5] will match with any character except 2,3,4 and 5. Be catious as that doesn't mean it matches with the following numbers: 0,1,6,7,8 or 9. It matches with any other character, even letters and symbols.

Some regex engines (check your language first) support character set subtractions and intersections.

  • Subtractions are usually defined as [range-[subrange_to_remove]], like [0-9-[2-7]] indicating a set that matches only 0,1,8 or 9.
  • Intersections are defined as [range1&&range2]. The character must belong to both ranges to be matched in the search.

Note: Remember \w from the previous lesson? It's shorthand for [a-zA-Z0-9_]

Exercise 2 - Searching years from 2000 to 2199
1
2
3
4
5
6
7
8
namespace RegexCourse{
public static class Exercise2{
//Write a regex pattern to match years between 2000 and 2199, inclusive.
//Remember that 20000 or 02000 won't be valid years. You need to limit
// the size of the search to avoid more than 4 digits.
public static string Pattern_Exercise2=@"";
}
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

For the next exercise, you need to create a complex pattern set with the following constraints:

  • Search for any consonant, search for a lowercase vowel, and search for either the letter 'n' or 's'.
Exercise 3 - Complex pattern set
1
2
3
4
5
6
7
8
9
namespace RegexCourse{
public static class Exercise3{
//Write a regex pattern to match the following:
// Any consonant, then a lowercase vowel, then either the letter 'n' or 's'.
//In this exercise "y" and "Y" are considered consonants.
//Note: C# allows character set substractions
public static string Pattern_Exercise3=@"";
}
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Continue to the next lesson to learn about Repetitions in Regular Expressions.

Open Source Your Knowledge: become a Contributor and help others learn. Create New Content