Regular Expressions Basics

Marchete
8,273 views

Open Source Your Knowledge, Become a Contributor

Technology knowledge has to be shared and made accessible for free. Join the movement.

Create Content

Repetitions

Repetitions simplify using the same pattern several consecutive times. They also allow for flexible length searches, so you can match 'aaZ' and 'aaaaaaaaaaaaaaaaZ' with the same pattern.

{ } Ranges

You can use the following syntax for defined ranges:

PatternDescription
{n}Repeat the previous symbol exactly n times
{n,}Repeat the previous symbol n or more times
{min,max}Repeat the previous symbol between min and max times, both included

So a{6} is the same as aaaaaa, and [a-z]{1,3} will match any text that has between 1 and 3 consecutive letters.

Note: In repetitions, each symbol match is independent. If [a-z]{1,3} first matches with 'a', on the next letter it can match with anything in the [a-z] range, not only 'a'.

Other Ranges

You can use the following syntax for other types of ranges:

PatternDescription
*Repeat the previous symbol 0 or more times
+Repeat the previous symbol 1 or more times
?Repeat the previous symbol 0 or 1 times

Note: * is the same as {0,}, + is the same as {1,}, and ? is the same as {0,1}

A common use for ? is to allow both singular and plural words: cats? will match either cat or cats.

Repetitions are greedy on searches; they try to get the largest match possible. Sometimes that's undesired, so you can force a lazy search by adding ? after * or +. The ? instructs the regex engine to make a lazy search, which gives the smallest match possible.

Greedy Search: a.*a will find Greedy Search Lazy Search: a.*?a will find Lazy Search

Exercise 4 - Simplified XML Tags
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
namespace RegexCourse{
public static class Exercise4{
//Write a regex pattern to match simplified XML tags.
//Simplified XML tags will be defined as <text> or </text>
//text has size >=1, and can contain these characters:
// -Any letter
// -Any digit
// -These symbols: = \s " - _ :
//Your regex pattern should not match the characters between XML tags.
//I.e: In the XML <text>ZZZZZ</text>, the text ZZZZZ shouldn't be matched.
//Tags starting with <? should NOT match, as ? is not in the set of allowed symbols
//Note: In literal C# strings, the " symbol must be escaped as double ""
// Also, in C# you need to escape / with \/
public static string Pattern_Exercise4=@"";
}
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Note: Regex is not recommended for parsing XML or HTML. See: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags However it can find what you need for simpler things.

In the next lesson, you'll learn about Alternations.

Open Source Your Knowledge: become a Contributor and help others learn. Create New Content