>>  <<  Ndx  Usr  Pri  Phr  Dic  Rel  Voc  !:  wd  Help  User

Regular Expression Verbs

The standard regex verbs are defined in system\main\regex.ijs .The main verbs are rxmatch and rxmatches. The former locates the first occurrence of a match in the string; the latter locates all occurrences. Four other verbs create, list, display, and free up compiled patterns: rxcomp, rxhandles, rxinfo, and rxfree.

Most of the rest of the definitions either use the rxmatch or rxmatches verbs, or take the result of them as arguments.

match=. pattern rxmatch string

Find first match

The result of rxmatch is a table, each row being an index/length pair. The first row describes the entire match, one row per subexpression follow which describes where each subexpression was found in the string. Where a match does not occur, _1 0 is returned.

matches =. pattern rxmatches string

Find all matches

rxmatches returns a list of tables, with one item per match in the string. The shape of the result is #matches by #subexpr by 2.

phandle =. rxcomp pattern

Compile pattern

rxfree phandle

Release compiled pattern

phandles =. rxhandles ''

Return all pattern handles

'nsub pat' =. rxinfo phandle

Return #subexprs;pattern


The verbs rxcomp, rxhandles, rxinfo, and rxfree allow you to create pattern handles which are simple integers which represent compiled patterns. A handle can be used anywhere a pattern can be and, if used repeatedly, will avoid having to recompile the pattern on each call.

rxcomp compiles a pattern and returns a handle.

rxhandles returns a list of all existing handles.

rxinfo returns information about a handle. It currently returns a boxed list of 1 + the number of subexpressions and the original pattern. The length of the result may be extended (on the right) in the future.

rxfree releases all resources associated with a compiled pattern.

errtext =. rxerror ''

Error text

The result of rxerror is a text string describing the last error from a regular expression verb.

ismatch =. pattern rxeq string

1 if entire string matches

Returns a 1 if the pattern fully describes the string. (Similar to = verb).

index =. pattern rxindex string

index of match or #string

The result of rxindex is the index of the first match, or #string if none. (Similar to i. verb).

mask =. pattern rxE string

mask: 1's start matches

rxE returns a boolean mask of length #string, with 1's to mark the start of a match. (Similar to E. verb).

sub =. pattern rxfirst string

first substring match

rxfirst returns the substring in the right argument which matches the pattern.

subs =. pattern rxall string

all substring matches

The result of rxall is a boxed list of all substrings in the right argument which match the pattern.

subs =. matches rxfrom string

select substrings matched

rxfrom returns a box containing the substrings described by each index/length pair on the left.

subs =. matches rxcut string

cut into alternating non-match/match

rxcut returns a boxed list which will match the original string if razed. The items alternate between non-matches and matches, always starting with a non-match.

newstr =. string rxrplc (pat;rplcstr)

replace pat with rplcstr

rxrplc replaces substrings in the left argument. The right argument is a boxed list of the pattern and the replacement text.

newstr =. rplcstrs matches rxmerge string

merge rplcstrs into string

rxmerge takes a table of matches as an argument, and returns a verb which merges the boxed strings in the left argument into those positions on the right. (Similar to } adverb).

newstr =. pattern f rxapply string

apply f to each match

rxapply applies its verb argument to each of the substring in the right argument which match the pattern in the left argument.

All verbs which take a pattern as an argument can be called with either a character list containing a pattern or pattern handle (an integer resulting from rxcomp). For example,

'[[:alpha:]]+' rxmatches str
NB. match all sets of letters in str
handle=. rxcomp '[[:alpha:]]+'
NB. compile pattern into handle
handle rxmatches str
NB. do the match
rxfree handle
NB. (once handle is no longer required)

Notes
1. the rmatch and rxmatches verbs return either a single or list of matches, respectively, with each match being a table of index/length pairs for the match and each subexpression. Other verbs which use the result of rxmatch or rxmatches tend to only use the first row for each match, which represents the entire match.

2. if you're interested in one or more of the subexpressions, it is possible to identify the specific rows of the match which are to be returned by rxmatch and rxmatches. If a boxed array is passed rather than a character or numeric pattern, it is a 2-element list consisting of a pattern and a list of the indices of the important rows in a match.

For example, the pattern '(x+)([[:digit:]]+)' matches one or more letters 'x', followed by a string of digits, with both the 'x's and the digits being a subexpressions of the pattern. Each match will be returned as a three-row table, describing the entire match, just the 'x's, and just the digits.

   pat=. rxcomp '(x+)([[:digit:]]+)'
   str=. 'just one xxx1234 match here'
   pat rxmatches str
 9 7
 9 3
12 4
   (pat;1 2) rxmatches str   NB. just the 'x's and digits
 9 3
12 4

   pat |. rxapply str        NB. reverse the whole match
just one 4321xxx match here
   (pat;,2) |. rxapply str   NB.  reverse just the digits
just one xxx4321 match here

Examples

   pat=. '[[:alpha:]][[:alnum:]_]*'  NB. pattern for J name
   str=. '3,foo3=.23,j42=.123,123'   NB. a sample string
   pat rxmatch str                   NB. find at index 2, length 4
2 4

   pat=. '([[:alpha:]][[:alnum:]_]*) *=[.;]'   NB. subexp is name in assign
   pat rxmatch str         NB. pattern at 2/6; name at 2/4
2 6
2 4
   pat rxmatches str       NB. find all matches
 2 6
 2 4
 
11 5
11 3

   ]phandle=. rxcomp pat   NB. compile
1
   rxcomp '[wrong'         NB. a bad pattern
|domain error: rxcomp
|       rxcomp'[wrong'
   rxerror ''
Unmatched [ or [^
   rxhandles ''            NB. just handle 1 defined
1

   rxinfo phandle          NB.  return (1+#subexp);pattern
+-+---------------------------------+
|2|([[:alpha:]][[:alnum:]_]*) *=[.;]|
+-+---------------------------------+

   phandle rxmatches str   NB. use phandle like pattern
 2 6
 2 4
 
11 5
11 3

   phandle rxfirst str     NB. first matching substring
foo3=.

   phandle rxall str       NB. all matching substrings
+------+-----+
|foo3=.|j42=.|
+------+-----+

   phandle rxindex&> '  foo=.10';'nothing at all'   NB. index of match
2 14

   phandle rxE str                 NB. mask over matches
0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

'[[:digit:]]*' rxeq '2342342' NB. test for exact match 1 '[[:digit:]]*' rxeq '2342 342' 0 NB. rxfrom selects substring using index/length pairs phandle rxmatch str 2 6 2 4 NB. entire and subexpression match

m=. phandle rxmatches str phandle rxmerge str +------+----+ |foo3=.|foo3| +------+----+ phandle rxmatches str NB. all matches 2 6 2 4 11 5 11 3 ]m=.(phandle;,0) rxmatches str NB. entire matches only 2 6 11 5 m rxcut str NB. return alternating non-match/match boxes +--+------+---+-----+-------+ |3,|foo3=.|23,|j42=.|123,123| +--+------+---+-----+-------+ phandle |. rxapply str NB. reverse each match 3,.=3oof23,.=24j123,123 (phandle;,1) |. rxapply str NB. reverse just name part of match 3,3oof=.23,24j=.123,123

>>  <<  Ndx  Usr  Pri  Phr  Dic  Rel  Voc  !:  wd  Help  User