.NETics: Little RegEx Examples in .NET

Monday, July 23, 2007

Little RegEx Examples in .NET

If you want to use any of the special characters(viz. [,\, ^, $, . , , ?, *, + , (, ) ) as a literal in a regex, you need to escape them with a backslash.

e.g. Dot in the email pattern

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

Most regular expression flavors treat the brace { as a literal character, unless it is part of a repetition operator like {2,6}.

Using Character Classes, you can find text with particular set of characters.

e.g. To allow alphanumeric characters and underscores, use [A-Za-z_0-9]*

The only special characters or metacharacters inside a character class are the closing bracket (]), the backslash (\), the caret (^) and the hyphen (-).

Examples

1) Email Regular Expression

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

2) HTML Tag matching - Regular Expression

<([A-Z][A-Z0-9]*)[^>]*>(.*?)</\1>

This is very simple example and can't be used in real scenario, as it can't find the nested html tags.

Match the character "<" literally (this denotes the start of the HTML tag)
Match the regular expression below and capture its match into backreference number 1 (this is later useful in finding the end html tag)
- Match a single character out of the list: in the range between A and Z(the first character has to be alphabet-non-numeric)
- Match a single character out of the list: in the range between A and Z, or in the range between 0 and 9(The second character has to be any alphanumeric)
  - Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Match a single character not present in the list: one of the characters ">"
- (this allows any no. of spaces and attributes within the HTML tag)Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Match the character ">" literally(the start tag ends here)
Match the regular expression below and capture its match into backreference number 2 ((.*?))
- Match any single character (. will match any single character including new line)
  - Between zero and unlimited times, as few times as possible, expanding as needed (lazy)(* matches it unlimited times and ? stays non-greedy search nature i.e. stop asap)
Match the characters "</" literally (Above...once </ is found, it will backtrack, and here starts end tag)
Match the same text as most recently matched by backreference number 1 (It has to find text that was saved as backreference number 1.)
Match the character ">" literally (now it has to find the end tag ending character >)

3) Do you want to make RegEx for parsing ASP 3.0 Classic style written program/code?

This can help in matching. And it's a beautiful example of using named capture in Regular Expressions

Technorati Tags: RexEx, Regular Expressions, RegEx Examples

.

(?<tbefore>.*?)(?<allcode><%(?<equals>=)*\s*(?<code>.*?)%>)(?<endtext>.*)

Here, tbefore name will capture all text before code starting tag <%.

allcode name will capture all code text within <% %> including signs.

code will capture all code within <% %>.

endtext name will capture all text after this code block.

No comments:

Subscribe to: Post Comments (Atom)

Dotster Domain Registration Special Offer