in reply to Regular expression pattern matching question

I intially just ran perl -MRegexp::List -le 'print Regexp::List->new->list2re( 4333 .. 9999 )' but that made a larger regex than I liked. Here's what I did by hand. It should be easy to follow how it's constructed.

$rx_4333 = qr/ (?: 4 3 3 [3-9] | 4 3 [4-9] \d | 4 [4-9] \d \d | [5-9] \d \d \d ) $/x

This next regex does the same thing but has less work to do. In the previous example, the different paths would have to retry matching stuff that was already known to be true (the first 4??? vs the second 4???). This does the minimum amount of work.

$rx_4333 = / (?: 4 (?: 3 (?: 3 [3-9] | [4-9] \d ) | [4-9] \d \d ) | [5-9] \d \d \d ) $/x

⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Replies are listed 'Best First'.
Re^2: Regular expression patter matching question
by ikegami (Patriarch) on Jan 19, 2006 at 20:03 UTC

    You can simply #2 a a bit:

    $rx_4333 = /(?=\d{4}$) (?: 4 (?: 3 (?: 3 [3-9] | [4-9] ) | [4-9] ) | [5-9] ) $/x
    That rx can be built dynamically as follows:
    my @digits = $ProdBuild =~ /([0-9])([0-9])([0-9])([0-9])$/; my $rx = '(?=\\d{4}$)'; for (@digits) { $rx .= "(?:$_"; } for (reverse @digits) { local $_ = $_+1; $rx .= '|' . ($_ == 9 ? 9 : "[$_-9]") if $_ != 10; $rx .= ')'; } # 4333 gives (?=\d{4}$)(?:4(?:3(?:3(?:3|[4-9])|[4-9])|[4-9])|[5-9]) foreach (@pattern) { print("Match: $_\n") if /$rx/; }

    Update: I cleaned up the regexp building code a bit, at the cost of a little redundancy in the regexp. For example, a last digit of 3 results in 3|[4-9].

      Ah. Just so. I like the regex and how you eliminated my wildcards.

      The generation code was kind of ugly.

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re^2: Regular expression patter matching question
by grinder (Bishop) on Jan 19, 2006 at 21:14 UTC

    For what it's worth, the next version of Regexp::Assemble (v0.24) will be able to do the following

    $ perl -le 'print $_ for 4333 .. 9999' | \PERL5LIB=blib/lib assemble (?:4(?:3(?:3[3456789]|[456789]\d)|[456789]\d\d)|[56789]\d\d\d)

    ... in 1.3 seconds on hardware a couple of years old.

    If I get the warnings to stop, I'll throw in japhy's mind-bendingly marvellous list-to-range regexp which will allow it to get that down to:

    (?:4(?:3(?:3[3-9]|[4-9]\d)|[4-9]\d\d)|[5-9]\d\d\d)

    ... which, interestingly enough, looks as if it arrives at the same conclusion as you, which is a nice validation, thanks :)

    • another intruder with the mooring in the heart of the Perl

      The statement "too large" was ambiguous. Regexp::List->new->list2re(4333..9999) makes an 8K regexp.

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊