C# Professional - Processing Text

talent-agile
104K views

Open Source Your Knowledge, Become a Contributor

Technology knowledge has to be shared and made accessible for free. Join the movement.

Create Content

Regular Expressions - Groups

When building a regular expression pattern, you can specify groups that will match one value between multiple possible values. This is done using the parenthesis : (banana|ananas|apple).

This example will match any text containing banana, ananas or apple.

Groups are very handy when working with a regular expression where you need to specify multiple options for a specific work.

Quantifiers

As with individual characters, groups can use quantifiers to specify the number of occurence of the group.

Capture & Backreference

When using groups in the pattern, by default, the regular expression will capture the value corresponding to that group. This is often used when using regular expressions to extract a specific substring from a larger text.

In .Net, the value captured can be retrieved using the Groups property of a Match from a regular expression.

Note: the first element in the Groups enumeration is the whole match, captured groups start at the 1 index

Values captured from a group can also be used as backreference in the pattern, allowing to ensure that the first captured value is the same in another part of the regular expression.

The backreference is done with the \N syntax, where N is the number of the referenced group in the pattern.

Example: user_id: (\d+) - validating email for user \1

This will match text when the first user_id is the same than the one at the end of the text.

Naming groups

Groups can be given a name with a specific syntax in the pattern.

user_name: (?<username>\w+)

Here, the capturing group is named username. This name can be used for backreferences using the \k<username> syntax, and can be used when retrieving groups on a Match object in .Net.

Open Source Your Knowledge: become a Contributor and help others learn. Create New Content