# regex -- evaluate a regular expression search

## Synopsis

• Usage:
regex(re, start)
regex(re, start, str)
regex(re, start, range, str)
• Inputs:
• re, , a regular expression describing a pattern
• start, an integer, positive, the position in str at which to begin the search. when omitted, the search starts at the beginning of the string.
• range, an integer, restricts matches to those beginning at a position between start and start + range; when 0, the pattern is matched only at the starting position; when negative, only positions to the left of the starting position are examined for matches; when omitted, the search extends to the end of the string.
• str, , the subject string to be searched
• Optional inputs:
• POSIX => , default value false, if true, interpret the re using the POSIX Extended flavor, otherwise the Perl flavor
• Outputs:
• a list, a list of pairs of integers; each pair denotes the beginning position and the length of a substring. Only the leftmost matching substring of str and the capturing groups within it are returned. If no match is found, the output is null.

## Description

The value returned is a list of pairs of integers corresponding to the parenthesized subexpressions successfully matched, suitable for use as the first argument of substring. The first member of each pair is the offset within str of the substring matched, and the second is the length.

See regular expressions for a brief introduction to the topic.

 i1 : s = "The cat is black."; i2 : m = regex("(\\w+) (\\w+) (\\w+)",s) o2 = {(0, 10), (0, 3), (4, 3), (8, 2)} o2 : List i3 : substring(m#0, s) o3 = The cat is i4 : substring(m#1, s) o4 = The i5 : substring(m#2, s) o5 = cat i6 : substring(m#3, s) o6 = is i7 : s = "aa aaaa"; i8 : m = regex("a+", 0, s) o8 = {(0, 2)} o8 : List i9 : substring(m#0, s) o9 = aa i10 : m = regex("a+", 2, s) o10 = {(7, 4)} o10 : List i11 : substring(m#0, s) o11 = aaaa i12 : m = regex("a+", 2, 3, s) i13 : s = "line 1\nline 2\r\nline 3"; i14 : m = regex("^.*$", 8, -8, s) o14 = {(7, 6)} o14 : List i15 : substring(m#0, s) o15 = line 2 i16 : m = regex("^", 10, -10, s) o16 = {(7, 0)} o16 : List i17 : substring(0, m#0#0, s) o17 = line 1 i18 : substring(m#0#0, s) o18 = line 2 line 3 i19 : m = regex("^.*$", 4, -10, s) o19 = {(0, 6)} o19 : List i20 : substring(m#0, s) o20 = line 1 i21 : m = regex("a.*\$", 4, -10, s)

By default, the regular expressions are interpreted using the Perl flavor, which supports features such as lookaheads and lookbehinds for fine-tuning the matches. This syntax is used in Perl and JavaScript languages.

 i22 : regex("A(?!C)", "AC AB") o22 = {(3, 1)} o22 : List i23 : regex("A(?=B)", "AC AB") o23 = {(3, 1)} o23 : List

Alternatively, one can choose the POSIX Extended flavor of regex using POSIX => true. This syntax is similar to the one used by the Unix utilities egrep and awk and enforces the leftmost, longest rule for finding matches. If there's a tie, the rule is applied to the first subexpression.

 i24 : s = "bold and strong"; i25 : m = regex("(.*)", s, POSIX => true); i26 : substring(m#1, s) o26 = bold and strong

In the Perl flavor, one can specify whether repetitions should be possessive or non-greedy.

 i27 : m = regex("(.*?)", s); i28 : substring(m#1, s) o28 = bold