The standard regex verbs are defined in system\main\regex.ijs
.The main verbs are rxmatch
and rxmatches
. The former locates the first occurrence of a match in the string; the latter locates all occurrences. Four other verbs create, list, display, and free up compiled patterns: rxcomp, rxhandles, rxinfo
, and rxfree
.
Most of the rest of the definitions either use the rxmatch
or rxmatches
verbs, or take the result of them as arguments.
match=. pattern rxmatch string | Find first match |
rxmatch
is a table, each row being an index/length pair. The first row describes the entire match, one row per subexpression follow which describes where each subexpression was found in the string. Where a match does not occur, _1 0
is returned.
matches =. pattern | Find all matches |
rxmatches
returns a list of tables, with one item per match in the string. The shape of the result is #matches
by #subexpr
by 2
.
phandle =. | Compile pattern |
| Release compiled pattern |
phandles =. | Return all pattern handles |
'nsub pat' =. | Return #subexprs;pattern |
rxcomp, rxhandles, rxinfo
, and rxfree
allow you to create pattern handles which are simple integers which represent compiled patterns. A handle can be used anywhere a pattern can be and, if used repeatedly, will avoid having to recompile the pattern on each call.
rxcomp
compiles a pattern and returns a handle.
rxhandles
returns a list of all existing handles.
rxinfo
returns information about a handle. It currently returns a boxed list of 1 +
the number of subexpressions and the original pattern. The length of the result may be extended (on the right) in the future.
rxfree
releases all resources associated with a compiled pattern.
errtext | Error text |
rxerror
is a text string describing the last error from a regular expression verb.
ismatch | 1 if entire string matches |
1
if the pattern fully describes the string. (Similar to = verb).
index | index of match or #string |
rxindex
is the index of the first match, or #
string if none. (Similar to i
. verb).
mask |
mask: |
rxE
returns a boolean mask of length #string, with 1
's to mark the start of a match. (Similar to E
. verb).
sub | first substring match |
rxfirst
returns the substring in the right argument which matches the pattern.
subs | all substring matches |
rxall
is a boxed list of all substrings in the right argument which match the pattern.
subs | select substrings matched |
rxfrom
returns a box containing the substrings described by each index/length pair on the left.
subs | cut into alternating non-match/match |
rxcut
returns a boxed list which will match the original string if razed. The items alternate between non-matches and matches, always starting with a non-match.
newstr | replace pat with rplcstr |
rxrplc
replaces substrings in the left argument. The right argument is a boxed list of the pattern and the replacement text.
newstr | merge rplcstrs into string |
rxmerge
takes a table of matches as an argument, and returns a verb which merges the boxed strings in the left argument into those positions on the right. (Similar to } adverb).
newstr | apply f to each match |
rxapply
applies its verb argument to each of the substring in the right argument which match the pattern in the left argument.
All verbs which take a pattern as an argument can be called with either a character list containing a pattern or pattern handle (an integer resulting from rxcomp
). For example,
'[[:alpha:]]+' rxmatches str | |
handle=. rxcomp '[[:alpha:]]+' | |
handle rxmatches str | |
rxfree handle | NB. (once handle is no longer required) |
Notes
1. the rmatch
and rxmatches
verbs return either a single or list of matches, respectively, with each match being a table of index/length pairs for the match and each subexpression. Other verbs which use the result of rxmatch
or rxmatches
tend to only use the first row for each match, which represents the entire match.
2. if you're interested in one or more of the subexpressions, it is possible to identify the specific rows of the match which are to be returned by rxmatch
and rxmatches
. If a boxed array is passed rather than a character or numeric pattern, it is a 2-element list consisting of a pattern and a list of the indices of the important rows in a match.
For example, the pattern '(x+)([[:digit:]]+)'
matches one or more letters 'x'
, followed by a string of digits, with both the 'x'
s and the digits being a subexpressions of the pattern. Each match will be returned as a three-row table, describing the entire match, just the 'x'
s, and just the digits.
pat=. rxcomp '(x+)([[:digit:]]+)' str=. 'just one xxx1234 match here' pat rxmatches str 9 7 9 3 12 4 (pat;1 2) rxmatches str NB. just the 'x's and digits 9 3 12 4 pat |. rxapply str NB. reverse the whole match just one 4321xxx match here (pat;,2) |. rxapply str NB. reverse just the digits just one xxx4321 match here
Examples
pat=. '[[:alpha:]][[:alnum:]_]*' NB. pattern for J name str=. '3,foo3=.23,j42=.123,123' NB. a sample string pat rxmatch str NB. find at index 2, length 4 2 4 pat=. '([[:alpha:]][[:alnum:]_]*) *=[.;]' NB. subexp is name in assign pat rxmatch str NB. pattern at 2/6; name at 2/4 2 6 2 4 pat rxmatches str NB. find all matches 2 6 2 4 11 5 11 3 ]phandle=. rxcomp pat NB. compile 1 rxcomp '[wrong' NB. a bad pattern |domain error: rxcomp | rxcomp'[wrong' rxerror '' Unmatched [ or [^ rxhandles '' NB. just handle 1 defined 1 rxinfo phandle NB. return (1+#subexp);pattern +-+---------------------------------+ |2|([[:alpha:]][[:alnum:]_]*) *=[.;]| +-+---------------------------------+ phandle rxmatches str NB. use phandle like pattern 2 6 2 4 11 5 11 3 phandle rxfirst str NB. first matching substring foo3=. phandle rxall str NB. all matching substrings +------+-----+ |foo3=.|j42=.| +------+-----+ phandle rxindex&> ' foo=.10';'nothing at all' NB. index of match 2 14 phandle rxE str NB. mask over matches 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
'[[:digit:]]*' rxeq '2342342' NB. test for exact match 1 '[[:digit:]]*' rxeq '2342 342' 0 NB. rxfrom selects substring using index/length pairs phandle rxmatch str 2 6 2 4 NB. entire and subexpression match
m=. phandle rxmatches str phandle rxmerge str +------+----+ |foo3=.|foo3| +------+----+ phandle rxmatches str NB. all matches 2 6 2 4 11 5 11 3 ]m=.(phandle;,0) rxmatches str NB. entire matches only 2 6 11 5 m rxcut str NB. return alternating non-match/match boxes +--+------+---+-----+-------+ |3,|foo3=.|23,|j42=.|123,123| +--+------+---+-----+-------+ phandle |. rxapply str NB. reverse each match 3,.=3oof23,.=24j123,123 (phandle;,1) |. rxapply str NB. reverse just name part of match 3,3oof=.23,24j=.123,123