Regular Expressions Basics

Marchete
9,244 views

Open Source Your Knowledge, Become a Contributor

Technology knowledge has to be shared and made accessible for free. Join the movement.

Create Content
Previous: Alternations Next: Next Steps

Groups & Capturing Groups

Groups use the ( ) symbols (like alternations, but the | symbol is not needed). They are useful for creating blocks of patterns, so you can apply repetitions or other modifiers to them as a whole. In the pattern ([a-x]{3}[0-9])+, the + metacharacter is applied to the whole group.

Also, another main use of groups is for processing parts of a match like extracting data or replacing it.

( ) Unnamed Groups

With pattern1(pattern2)pattern3, you'll capture the results of pattern2 for later use but not the parts matched by pattern1 or pattern3. This is useful when you want to extract only a portion of the search. Imagine that you are reading some text files that are formatted as forms. They could have data like this:

Name:"John" Surname:"Doe" Email:"john@example.com"

If you need to extract the value of the Name part (John in the example), you can use a pattern like Name:"([\w]+?)" to capture just the useful data, using the Name:" as a reference for locating the data within the text.

Note: If you apply a repetition to a group, only the last match of the repetition is stored. ([\w])+? will only give you the last matched character. However, the ([\w]+?) group has the repetition inside, so it will give you all matched characters.

(?: ) Non capturing Groups

Use (?: ) for non capturing groups. If you need to use a group as a block but you won't process the results later, then make it non-capturing.

Named Groups

Use (?<groupname> ) to capture a group with name groupname. This is useful for later processing when input data may be presented in a different order than desired.

Name:"John" Surname:"Doe" Email:"john@example.com"

Consider the following regex pattern:

Name:"(?<Name>[\w]+?)".*?Surname:"(?<Surname>[\w]+?)".*?Email:"(?<Email>\b[\w.%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b)"

This pattern will match each piece of data and will create three Name Groups: Group 'Name' with data John, Group 'Surname' with data Doe and Group 'Email' with data john@example.com. Each language and regex engine define how to access matched groups. Check your language documentation to learn how to iterate and process matched groups.

Exercise 6 - Image Files with Path
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
namespace RegexCourse{
public static class Exercise6{
//Match all images with full paths, and create three named groups for capturing the following:
//Named Group 'Drive': <Drive> Letter only
//Named Group 'Path': <directories>
//Named Group 'Name': <filename>.<extension>
//Files are in NTFS, and are composed as follows:
//<Drive>:\<directories><filename>.<extension>
//<Drive> is a letter from a to z
//<directories> is an optional section, formed by zero or more <directory>\ blocks
// <directory> is formed as one or more of the following characters:
// -Any alphanumeric character
// -Any of these symbols: +-_=()
// <filename> is formed as one or more of the following characters:
// -Any alphanumeric character
// -Any of these symbols: .+-_=()
//Valid <extension> for images are:
// jpg,jpeg,png,bmp,gif
public static string Pattern_Exercise6=@"";
/*
//Another option, as a combination of subpatterns
public static string DrivePattern = @""; //(?<Drive>...
public static string DirPattern = @""; //To match text of each directory
public static string DirsPattern = @"" + DirPattern + @""; //(?<Path>...
public static string TextPattern = @""; //To match <filename>, similar to Exercise5
public static string FilePattern = @"" + TextPattern + @""; //(?<Name>...
public static string Pattern_Exercise6 = DrivePattern+@":\\"+DirsPattern+FilePattern;
*/
}
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Note: There are many other types of grouping, for lookahead, lookbehind, atomic groups, conditionals, recursion, etc. All of these are outside the scope of this course. However, feel free to contribute to this course by adding these groupings and filing a PR.

Open Source Your Knowledge: become a Contributor and help others learn. Create New Content