cyclone has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I'm trying to match some strings and not others within the same regex but it's not working. I was hoping someone with far more experience and wisdom could take a look and point me in the right direction.

MATCH: r10,rect12x50,rect20.5x30.5,dounut_s12x30
NOMATCH: ddil3,construct+10,is.i274x

I need the interger or float values extracted.
i.e., r10 has $1 only but rect12x50 has $1 and $2 set. But for "r10" I am getting $1 = 1 and $2 = 0 sometimes. Strings with 2 distinct number values seem to work fine.

The regex:
(($string !~ /^i/)&& ($string !~ /^ddi/)&& ($string =~ /^[a-z_]+([\d]+[\.]?[\d]*)[a-z]?([\d]+[\.]?[\d]*)/))
Thanks

Replies are listed 'Best First'.
Re: Regex trouble
by ZZamboni (Curate) on May 17, 2000 at 00:12 UTC
    You could use something like this:
    $float='\d+(?:\.\d+)?'; $int='\d+'; @pats=("r($int)", "rect($float)x($float)", "dounut_s($int)x($int)"); $pat=join("|", @pats); @numbers=grep { defined($_) } ($str =~ /^(?:$pat)$/);
    Add your MATCH patterns to @pats, enclosing in parenthesis the parts you want. The grep eliminates undef elements in the result of the match, which correspond to those patterns that did not match, and leaves only the matching parts.

    --ZZamboni

      Would that $float regex match an integer? I will have to find the (?:\.\d+) example in my Perl books to see what it does. If it matches a non-float number then I think it will solve my problem.
        Yes it does:
        (?: - group, but don't generate back ref \. - a dot \d+ - followed by one or more digits )? - and the whole thing is optional
        So \d+(?:\.\d+)? matches a group of digits, optionally followed by a dot and more digits. So, it matches an integer or a float.

        If you want to allow things like "20." (no numbers after the dot) you would have it to \d+(?:\.\d*) (asterisk instead of plus sign after the second \d).

        --ZZamboni

RE: Regex trouble
by Russ (Deacon) on May 17, 2000 at 00:18 UTC
    It appears that you just forgot a ? after the second "half" of the regex. That's why it would match with two numbers but not with one.

    Here's how I modified it:

    my $R = 'dounut_s12x30'; print "$1, $2\n" if $R !~ /^i|^ddi/ and $R =~ /^[a-z_]+(\d+\.?\d*)[a-z_]?(\d+\.?\d*)? +/;
    Outputs:
    12, 30

    Is this what you are asking?

    Russ

      Hey Russ,
      That's close but I have to be able to match "r10" , "r10.5" , "rect20x50" or "rect20.5x50.5".
      And in reality I the word char part of the string can be almost anything. And there may or may not be a second set of number chars. That's why I left the trailing "?" off of the regex.
        Hmmm,

        The trailing ? is what allows you to have (or not have) a second set of numbers.

        This regex works for all sample strings you provided. Feel free to post more information, and I'll help you however I can.

        Russ

Re: Regex trouble
by cyclone (Novice) on May 17, 2000 at 03:00 UTC
    Thanks for all the replies folks!!
    Here is the current sub I was working on.
    I basically pass a size to test against ($feat_size) and return true (1) or false (0) depending on the following...
    sub has_smaller { my($f,$job,$step,$layer,$feat_size) = @_; my $size_1; my $size_2; my $tmp_size; my %seen = (); my $feature_list = (); my @unique_list = (); $f->DO_INFO("-t layer -e $job/$step/$layer -d SYMS_HIST"); # datab +ase query initializes "$f->{doinfo}" $feature_list = $f->{doinfo}{gSYMS_HISTsymbol}; # reference to a l +ist of strings undef %seen; @seen{@$feature_list} = (); @unique_list = sort keys %seen; foreach my $sym ( @unique_list ){ if ($sym =~ /^(?!i|ddi)[a-z_]+(\d+(?:\.\d+)?)(?:[a-z](\d+(?:\. +\d+)?))?/){ $size_1 = $1; $size_2 = $2; $tmp_size = $size_1; if (defined $size_2){ $tmp_size = $size_2 if $size_2 < $size_1; } if ($tmp_size < $feat_size){ return 1; } } } return 0; }
Re: Regex trouble
by mdillon (Priest) on May 17, 2000 at 00:51 UTC
    i think this is the correct regexp for the job:
    ^(?!i|ddi)[a-z_]+(\d+(?:\.\d+)?)(?:[a-z](\d+(?:\.\d+)?))?
    $1 will contain the first dimension, and $2 will contain either the second dimension, or undef if it is missing.

    however, it may be better to check for ^i|ddi separately, as others have done.

Re: Regex trouble
by turnstep (Parson) on May 17, 2000 at 01:05 UTC
    $string =~ /^\D+([0-9.]*)\D*([0-9.]*)/