This section describes the basic rules for creating regular expressions.
The pattern within the square brackets of a regular expression defines a character set that is used to match a single character. For example, the regular expression " [A-Za-z] " specifies to match any single uppercase or lowercase letter enclosed by spaces. In the character set, a hyphen indicates a range of characters.
The regular expression " B[IAU]G " matches the strings " BIG ", " BAG ", and " BUG ", but does not match the string " BOG ".
If you specified the regular expression as " B[IA][GN] ", the concatenation of character sets creates a regular expression that matches the corresponding concatenation of characters in the search string. This regular expression matches a space, followed by "B", followed by an "I" or "A", followed by a "G" or "N", followed by a trailing space. The regular expression matches " BIG ", " BAG ", "BIN ", and "BAN ".
The regular expression [A-Z][a-z]* matches any word that starts with an uppercase letter and is followed by zero or more lowercase letters. The special character * after the closing square bracket specifies to match zero or more occurrences of the character set.
A + after the closing square bracket specifies to find one or more occurrences of the character set. You interpret the regular expression " [A-Z]+ " as matching one or more uppercase letters enclosed by spaces. Therefore, this regular expression matches " BIG " and also matches " LARGE ", " HUGE ", " ENORMOUS ", and any other string of uppercase letters surrounded by spaces.
Considerations when using special characters
Since a regular expression followed by an * can match zero instances of the regular expression, it can also match the empty string. For example,
<cfoutput> REReplace("Hello","[T]*","7","ALL") - #REReplace("Hello","[T]*","7","ALL")#<BR> </cfoutput>
results in the following output:
REReplace("Hello","[T]*","7","ALL") - 7H7e7l7l7o
The regular expression [T]* can match empty strings. It first matches the empty string before "H" in "Hello". The "ALL" argument tells REReplace to replace all instances of an expression. The empty string before "e" is matched and so on until the empty string before "o" is matched.
This result might be unexpected. The workarounds for these types of problems are specific to each case. In some cases you can use [T]+, which requires at least one "T", instead of [T]*. Alternatively, you can specify an additional pattern after [T]*.
In the following examples the regular expression has a "W" at the end:
<cfoutput> REReplace("Hello World","[T]*W","7","ALL") - #REReplace("Hello World","[T]*W","7","ALL")#<BR> </cfoutput>
This expression results in the following more predictable output:
REReplace("Hello World","[T]*W","7","ALL") - Hello 7orld