Adobe ColdFusion 8

Using special characters

Regular expressions define the following list of special characters:

+ * ? . [ ^ $ ( ) { | \ 

In some cases, you use a special character as a literal character. For example, if you want to search for the plus sign in a string, you have to escape the plus sign by preceding it with a backslash:

"\+"

The following table describes the special characters for regular expressions:

Special Character

Description

\

A backslash followed by any special character matches the literal character itself, that is, the backslash escapes the special character.

For example, "\+" matches the plus sign, and "\\" matches a backslash.

.

A period matches any character, including newline.

To match any character except a newline, use [^#chr(13)##chr(10)#], which excludes the ASCII carriage return and line feed codes. The corresponding escape codes are \r and \n.

[ ]

A one-character character set that matches any of the characters in that set.

For example, "[akm]" matches an "a", "k", or "m". A hyphen in a character set indicates a range of characters; for example, [a-z] matches any single lowercase letter.

If the first character of a character set is the caret (^), the regular expression matches any character except those in the set. It does not match the empty string.

For example, [^akm] matches any character except "a", "k", or "m". The caret loses its special meaning if it is not the first character of the set.

^

If the caret is at the beginning of a regular expression, the matched string must be at the beginning of the string being searched.

For example, the regular expression "^ColdFusion" matches the string "ColdFusion lets you use regular expressions" but not the string "In ColdFusion, you can use regular expressions."

$

If the dollar sign is at the end of a regular expression, the matched string must be at the end of the string being searched.

For example, the regular expression "ColdFusion$" matches the string "I like ColdFusion" but not the string "ColdFusion is fun."

?

A character set or subexpression followed by a question mark matches zero or one occurrences of the character set or subexpression.

For example, xy?z matches either "xyz" or "xz".

|

The OR character allows a choice between two regular expressions.

For example, jell(y|ies) matches either "jelly" or "jellies".

+

A character set or subexpression followed by a plus sign matches one or more occurrences of the character set or subexpression.

For example, [a-z]+ matches one or more lowercase characters.

*

A character set or subexpression followed by an asterisk matches zero or more occurrences of the character set or subexpression.

For example, [a-z]* matches zero or more lowercase characters.

()

Parentheses group parts of a regular expression into subexpressions that you can treat as a single unit.

For example, (ha)+ matches one or more instances of "ha".

(?x)

If at the beginning of a regular expression, it specifies to ignore whitespace in the regular expression and lets you use ## for end-of-line comments. You can match a space by escaping it with a backslash.

For example, the following regular expression includes comments, preceded by ##, that are ignored by ColdFusion:

reFind("(?x)
    one                  ##first option
    |two                 ##second option
    |three\ point\ five  ## note escaped spaces
    ", "three point five")

(?m)

If at the beginning of a regular expression, it specifies the multiline mode for the special characters ^ and $.

When used with ^, the matched string can be at the start of the of entire search string or at the start of new lines, denoted by a linefeed character or chr(10), within the search string. For $, the matched string can be at the end the search string or at the end of new lines.

Multiline mode does not recognize a carriage return, or chr(13), as a new line character.

The following example searches for the string "two" across multiple lines:

#reFind("(?m)^two", "one#chr(10)#two")#

This example returns 4 to indicate that it matched "two" after the chr(10) linefeed. Without (?m), the regular expression would not match anything, because ^ only matches the start of the string.

The character (?m) does not affect \A or \Z, which always match the start or end of the string, respectively. For information on \A and \Z, see Using escape sequences.

(?i)

If at the beginning of a regular expression for REFind(), it specifies to perform a case-insensitive compare.

For example, the following line would return an index of 1:

    #reFind("(?i)hi", "HI")#

If you omit the (?i), the line would return an index of zero to signify that it did not find the regular expression.

(?=...)

If at the beginning of a regular expression, it specifies to use positive lookahead when searching for the regular expression.

Positive lookahead tests for the parenthesized subexpression like regular parenthesis, but does not include the contents in the match - it merely tests to see if it is there in proximity to the rest of the expression.

For example, consider the expression to extract the protocol from a URL:

<cfset regex = "http(?=://)">
<cfset string = "http://">
<cfset result = reFind(regex, string, 1, "yes")>
mid(string, result.pos[1], result.len[1])

This example results in the string "http". The lookahead parentheses ensure that the "://" is there, but does not include it in the result. If you did not use lookahead, the result would include the extraneous "://".

Lookahead parentheses do not capture text, so backreference numbering will skip over these groups. For more information on backreferencing, see Using backreferences.

(?!...)

If at the beginning of a regular expression, it specifies to use negative lookahead. Negative is just like positive lookahead, as specified by (?=...), except that it tests for the absence of a match.

Lookahead parentheses do not capture text, so backreference numbering will skip over these groups. For more information on backreferencing, see Using backreferences.

(?:...)

If you prefix a subexpression with "?:", ColdFusion performs all operations on the subexpression except that it will not capture the corresponding text for use with a back reference.

You must be aware of the following considerations when using special characters in character sets, such as [a-z]:

  • To include a hyphen (-) in the square brackets of a character set as a literal character, you cannot escape it as you can other special characters because ColdFusion always interprets a hyphen as a range indicator. Therefore, if you use a literal hyphen in a character set, make it the last character in the set.
  • To include a closing square bracket (]) in the character set, escape it with a backslash, as in [1-3\]A-z]. You do not have to escape the ] character outside of the character set designator.