>>  <<  Ndx  Usr  Pri  Phr  Dic  Rel  Voc  !:  wd  Help  User

Regular Expression Patterns

A regular expression pattern is a sequence of elements which matches successive portions of a character string. For example, simple letters are elements which match the same characters in the string. The asterisk indicates that the previous element should be matched 0 or more times. So, a pattern of abcd must match in the string exactly; a pattern of ab*cd matches the letter a followed by 0 or more occurrences of the letter b , followed by the letters cd . The particular elements of a pattern are described below.

Characters
Non-special characters match exactly. Non-special characters are anything other than:

   [ ] ( ) { } $ ^ . * + ? | \

A special character is included as simple text by preceding it with a backslash.

Character sets
The special character . matches any character (except the null character, 0{a. )

The special characters ^ and $ match the start and end of lines.

Sets of characters are defined by enclosing the list of characters in brackets:
[aeiou] matches a single vowel character

Ranges can also be included within the brackets:
[a-z] matches any lower case letter

Combinations of the above are acceptable:
[a-zA-Z13579] matches any lower case, upper case, or odd digit

Fixed sets (classes) of characters can be included in the list, as a name within bracket-colon pairs:
[#[:digit:]abc] matches the character # , a digit, or any of the letters a , b , or c

The character classes defined are:

 

alnum

alphanumeric

alpha

alphabetic

 

blank

tab and space

cntrl

control chars

 

digit

digits

graph

printable (except space)

 

lower

lowercase

print

printable

 

punct

punctuation

space

whitespace

 

upper

uppercase

xdigit

hex digits

If a set begins with ^ , then the pattern will match with any character not in the set.

Subexpressions
A series of elements may be combined by enclosing them in parenthesis. Subexpression are affected by closures such as * just as simple characters are:
([a-z][0-9])* matches any number of occurrences of a letter followed by a digit

The result of searches for a pattern return a match for the overall pattern, and a separate match for each subexpression

A \ followed by a digit, N, matches the same substring which occurred in the Nth subexpression:
([[:digit:]]+)#\1 matches one or more digits, followed by a # , followed by the same string of digits

Closures
A * following an element matches 0 or more occurrences of that element:
[aeiou]* matches 0 or more vowels

A + following an element matches 1 or more occurrences of that element:
[[:alpha:]]+ matches 1 or more alphabetic characters

A ? following an element matches 0 or 1 occurrences of that element:
-?[[:digit:]]+ matches an optional hyphen, followed by 1 or more digits

An interval expression, {m,n} , follows an element to allow it to match at least m, and no more than n, occurrences of the element:
[[:digit:]]{3,5} matches 3, 4, or 5 digits

Alternation
Multiple regular expressions can be separated with a vertical bar | to match any of them:
print|list|exit matches any of the strings print , list , and exit

Matches
When searching for a pattern in a string, it is possible to find multiple substrings which match the pattern. The one that is returned is the one which starts earliest in the string. If more than one match starts at the same place, the longest one is returned.

Even once a particular match is located, it is possible for there to be multiple combinations of the contents of the subexpressions which make it up. As a rule, whenever possible the subexpressions which begin earlier in the string will be as long as possible.

The result of a match is a table which describes the match. The first row covers the whole match, and subsequent rows describe where the subexpressions in the pattern match in the string. Each row has two elements: index of the first character of the start of the match, and the length of the match. Any row which doesn't participate in the match is filled with _1 0.


>>  <<  Ndx  Usr  Pri  Phr  Dic  Rel  Voc  !:  wd  Help  User