REX is a highly optimized pattern recognition tool that has been
modeled after the grep
and lex
Unix family of Unix
tools. Wherever possible REX's syntax has been held consistent with
these tools, but there are several major departures that may bite
those who are used to using the grep
family.
REX uses a combination of techniques that allow it to operate at a
much faster rate than similar expression matching tools. Unlike
grep
, Rex is both deterministic and non-directional. This
may cause some initial problems with users familiar with grep
's
way of thinking.
REX always applies repetition operators to the longest preceding expression. It does this so that it can maximize the benefits of using its rapid state skipping pattern matcher.
If you were to give grep
the expression: "ab*de+
"
It would interpret it as: an "a
" then 0 or more "b
"s
then a "d
" then 1 or more "e
"s.
REX will interpret this as: 0 or more occurrences of
"ab
" followed by 1 or more occurrences of "de
".
The second technique that provides REX with a speed advantage is ability to locate patterns both forwards and backwards indiscriminately.
Given the expression: "abc*def
", the pattern matcher is
looking for "Zero
to N
occurrences of `abc
'
followed by a `def
"'.
The following text examples would be matched by this expression:
abcabcabcabcdef
def
abcdef
But consider these patterns if they were embedded within a body of text:
My country 'tis of abcabcabcabcdef sweet land of def, abcdef.
A normal pattern matching scheme would begin looking for "abc*
"
. Since "abc*
" is matched by every position within the text,
the normal pattern matcher would plod along checking for "abc*
"
and then whether it's there or not it would try to match "def
".
REX examines the expression in search of the the most efficient
fixed length subpattern and uses it as the root of search rather than
the first subexpression. So, in the example above, REX would
not begin searching for "abc*
" until it has located a
"def
".
There are many other techniques used in REX to improve the rate at which it searches for patterns, but these should have no effect on the way in which you specify an expression.
The three rules that will cause the most problems to experienced
grep
users are:
abc=def*
" means one "abc
"
followed by 0 or more "def
"s.
abc*def*
" cannot be located because it
matches every position within the text.
a+ab
" is idiosyncratic because
"a+
" is a subpart of "ab
".