If you want to use any of the special characters(viz. [,\, ^, $, . , , ?, *, + , (, ) ) as a literal in a regex, you need to escape them with a backslash.
e.g. Dot in the email pattern
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b
Most regular expression flavors treat the brace { as a literal character, unless it is part of a repetition operator like {2,6}.
Using Character Classes, you can find text with particular set of characters.
e.g. To allow alphanumeric characters and underscores, use [A-Za-z_0-9]*
The only special characters or metacharacters inside a character class are the closing bracket (]), the backslash (\), the caret (^) and the hyphen (-).
Examples
1) Email Regular Expression
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b
2) HTML Tag matching - Regular Expression
<([A-Z][A-Z0-9]*)[^>]*>(.*?)</\1>
- Match the character "<" literally (this denotes the start of the HTML tag)
- Match the regular expression below and capture its match into backreference number 1 (this is later useful in finding the end html tag)
- Match a single character out of the list: in the range between A and Z(the first character has to be alphabet-non-numeric)
- Match a single character out of the list: in the range between A and Z, or in the range between 0 and 9(The second character has to be any alphanumeric)
- Match a single character not present in the list: one of the characters ">"
- Match the character ">" literally(the start tag ends here)
- Match the regular expression below and capture its match into backreference number 2 ((.*?))
- Match any single character (. will match any single character including new line)
- Between zero and unlimited times, as few times as possible, expanding as needed (lazy)(* matches it unlimited times and ? stays non-greedy search nature i.e. stop asap)
- Match any single character (. will match any single character including new line)
- Match the characters "</" literally (Above...once </ is found, it will backtrack, and here starts end tag)
- Match the same text as most recently matched by backreference number 1 (It has to find text that was saved as backreference number 1.)
- Match the character ">" literally (now it has to find the end tag ending character >)
3) Do you want to make RegEx for parsing ASP 3.0 Classic style written program/code?
This can help in matching. And it's a beautiful example of using named capture in Regular Expressions
.
(?<tbefore>.*?)(?<allcode><%(?<equals>=)*\s*(?<code>.*?)%>)(?<endtext>.*)
Here, tbefore name will capture all text before code starting tag <%.
allcode name will capture all code text within <% %> including signs.
code will capture all code within <% %>.
endtext name will capture all text after this code block.
No comments:
Post a Comment