ramya2005 has asked for the wisdom of the Perl Monks concerning the following question:

I have some products with pre defined sequence numbers. I want to write a pattern which will look for sequence numbers Greater than predefined.

For example in the below mentioned code:
$ProdBuild = 'K7000AKNBJQ4333';
My regular expression should match if for any product the last 4 digits are greater than or equal to 4333.

Matching Patterns:
$ProdBuild = 'K7000AKNBKQ4333';
$ProdBuild = 'K7000A-KNBKQ-4333';
$ProdBuild = 'K7000A-KNBKQ-6801';
$ProdBuild = 'K7000TKNBKQ7610';

Non-matching Patterns:
$ProdBuild = 'K7000AKNBKQ4233';
$ProdBuild = 'K7000AKNB4333';
$ProdBuild = 'K7000ANBKQ6845';
$ProdBuild = 'K7000TKNBKQ-1845';
The following code is good except that it can't look for the ones with last 4 digits greater than a specified number. Could some one help me in this part?
use strict; use warnings; no warnings qw(once); { my $ProdBuild = 'K7000AKNBJQ4333'; if ($ProdBuild =~ m/K7000[A-Z]\-?[A-Z][A-Z][A-Z][A-Z][A-Z]\-?[4-9] +[3-9]\d+/i) { print "Product match $ProdBuild \n" } else { print "No match found \n" } }

Replies are listed 'Best First'.
Re: Regular expression patter matching question
by ikegami (Patriarch) on Jan 19, 2006 at 19:13 UTC

    Regular expressions are not needed or particularly useful here. How about the following instead:

    my $ProdBuild = 'K7000AKNBJQ4333'; my $ProdBuildNum = substr($ProdBuild, -4); foreach (@pattern) { if (substr($ProdBuild, -4) >= $ProdBuildNum) { print("$_\n"); } }

    Or if you can have dashes in between the digits, how about the following:

    sub GetProdBuildNum { local $_ = @_ ? $_[0] : $_; s/\D//g; return substr($_, -4); } my $ProdBuild = 'K7000AKNBJQ4333'; my $ProdBuildNum = GetProdBuildNum($ProdBuild); foreach (@pattern) { if (GetProdBuildNum >= $ProdBuildNum) { print("$_\n"); } }
Re: Regular expression pattern matching question
by thundergnat (Deacon) on Jan 19, 2006 at 19:35 UTC

    Why make it complicated? If your regex is already testing for the presence of the ending digits, just capture them and test for the proper condition.

    use strict; use warnings; my $threshold = 4333; while (my $ProdBuild = <DATA>) { chomp $ProdBuild; if ($ProdBuild =~ m/K7000[A-Z]-?[A-Z]{5}-?(\d+)/i and $1 >= $thres +hold) { print "Product match $ProdBuild\n"; } else { print "No match found \n" } } __DATA__ K7000AKNBKQ4333 K7000A-KNBKQ-4333 K7000A-KNBKQ-6801 K7000TKNBKQ7610 K7000AKNBKQ4233 K7000AKNB4333 K7000ANBKQ6845 K7000TKNBKQ-1845

    yields:

    Product match K7000AKNBKQ4333
    Product match K7000A-KNBKQ-4333
    Product match K7000A-KNBKQ-6801
    Product match K7000TKNBKQ7610
    No match found
    No match found
    No match found
    No match found
    
Re: Regular expression patter matching question
by diotalevi (Canon) on Jan 19, 2006 at 19:16 UTC

    I intially just ran perl -MRegexp::List -le 'print Regexp::List->new->list2re( 4333 .. 9999 )' but that made a larger regex than I liked. Here's what I did by hand. It should be easy to follow how it's constructed.

    $rx_4333 = qr/ (?: 4 3 3 [3-9] | 4 3 [4-9] \d | 4 [4-9] \d \d | [5-9] \d \d \d ) $/x

    This next regex does the same thing but has less work to do. In the previous example, the different paths would have to retry matching stuff that was already known to be true (the first 4??? vs the second 4???). This does the minimum amount of work.

    $rx_4333 = / (?: 4 (?: 3 (?: 3 [3-9] | [4-9] \d ) | [4-9] \d \d ) | [5-9] \d \d \d ) $/x

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      You can simply #2 a a bit:

      $rx_4333 = /(?=\d{4}$) (?: 4 (?: 3 (?: 3 [3-9] | [4-9] ) | [4-9] ) | [5-9] ) $/x
      That rx can be built dynamically as follows:
      my @digits = $ProdBuild =~ /([0-9])([0-9])([0-9])([0-9])$/; my $rx = '(?=\\d{4}$)'; for (@digits) { $rx .= "(?:$_"; } for (reverse @digits) { local $_ = $_+1; $rx .= '|' . ($_ == 9 ? 9 : "[$_-9]") if $_ != 10; $rx .= ')'; } # 4333 gives (?=\d{4}$)(?:4(?:3(?:3(?:3|[4-9])|[4-9])|[4-9])|[5-9]) foreach (@pattern) { print("Match: $_\n") if /$rx/; }

      Update: I cleaned up the regexp building code a bit, at the cost of a little redundancy in the regexp. For example, a last digit of 3 results in 3|[4-9].

        Ah. Just so. I like the regex and how you eliminated my wildcards.

        The generation code was kind of ugly.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      For what it's worth, the next version of Regexp::Assemble (v0.24) will be able to do the following

      $ perl -le 'print $_ for 4333 .. 9999' | \PERL5LIB=blib/lib assemble (?:4(?:3(?:3[3456789]|[456789]\d)|[456789]\d\d)|[56789]\d\d\d)

      ... in 1.3 seconds on hardware a couple of years old.

      If I get the warnings to stop, I'll throw in japhy's mind-bendingly marvellous list-to-range regexp which will allow it to get that down to:

      (?:4(?:3(?:3[3-9]|[4-9]\d)|[4-9]\d\d)|[5-9]\d\d\d)

      ... which, interestingly enough, looks as if it arrives at the same conclusion as you, which is a nice validation, thanks :)

      • another intruder with the mooring in the heart of the Perl

        The statement "too large" was ambiguous. Regexp::List->new->list2re(4333..9999) makes an 8K regexp.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊