in reply to Extraction number from Text

The (?:.....) construct allows paranthetical constraining without capturing.

my @qty = 'GARNIER DEODORANT MINERALS - DRY CARE (50OZ)' =~ m/ \( ([+-\d.eE]*\d+) (?:KG|OZ|CL|LT|LTR|M)\b \) /igx;

I also made a couple of functional changes to your RE, which may or may not be appropriate, but which I suspect are in keeping with what you're after:

Have a look at perlre for a description of both (?:....) and the /x modifier.


Dave

Replies are listed 'Best First'.
Re^2: Extraction number from Text
by moritz (Cardinal) on Jun 14, 2010 at 08:44 UTC
    I allowed more than one \d digit, with \d+.

    It took me a while to grok it, but the original regex did allow more than one digit, and also captures it. That's because there is a \d in the character class, and the character class is quantified with a *. However this also allows more than one dot or more than one e, so it recognizes Ee.3 as a number.

    I agree that your regex is much better to read, but it doesn't allowe numbers before the exponential (I guess that's what the e in the regex is supposed to mean).

    Further refinements could use Regex::Common's number regex, or this regex, which parses numbers according to the JSON number specification:

    my $number = qr{ -? (?: 0 | [1-9] [0-9]* ) (?: \. [0-9]+ )? (?: [eE] [+-]? [0-9] )? }x;

    (might be a bit too restrictive in some cases for parsing numbers "in the wild", but still a good inspiration).

    Perl 6 - links to (nearly) everything that is Perl 6.

      Great point with respect to the '*' quantifier for the character class.

      The OP's example, which uses the '*' quantifier would, of course, allow NAN's to be parsed as numbers. For example: "--eeeeeeeeeee1" would be accepted as a number when it's definitely not (although perl could evaluate that string in numeric context giving it a value of 1). I didn't attempt to address that issue, but it goes to punctuate your next point which is.....

      Regex::Common is a nice resource too. If there's a resource that knows how to parse numbers, why write ones own number parser when it (a) takes more time, and (b) possibly introduces bugs? Regex::Common is the answer to both 'a' and 'b'.


      Dave

        For example: "--eeeeeeeeeee1" would be accepted as a number when it's definitely not (although perl could evaluate that string in numeric context giving it a value of 1).
        Eh, no. "--eeeeeeeeeee1" in numerical context is 0. When casting a string to a numeric value, Perl will always look at the beginning of a string. Something that starts with "--" cannot be a number; hence, it'll get the value 0.
        $ perl -wE 'say 0 + "--eeeeeeeeeee1"' Argument "--eeeeeeeeeee1" isn't numeric in addition (+) at -e line 1. 0 $
        Regex::Common is a nice resource too.
        s/Regex/Regexp/... ;-)
Re^2: Extraction number from Text
by Anonymous Monk on Jun 14, 2010 at 08:41 UTC
    Appreciate your inputs, Dave. Thanks.