in reply to Re: Extraction number from Text
in thread Extraction number from Text

I allowed more than one \d digit, with \d+.

It took me a while to grok it, but the original regex did allow more than one digit, and also captures it. That's because there is a \d in the character class, and the character class is quantified with a *. However this also allows more than one dot or more than one e, so it recognizes Ee.3 as a number.

I agree that your regex is much better to read, but it doesn't allowe numbers before the exponential (I guess that's what the e in the regex is supposed to mean).

Further refinements could use Regex::Common's number regex, or this regex, which parses numbers according to the JSON number specification:

my $number = qr{ -? (?: 0 | [1-9] [0-9]* ) (?: \. [0-9]+ )? (?: [eE] [+-]? [0-9] )? }x;

(might be a bit too restrictive in some cases for parsing numbers "in the wild", but still a good inspiration).

Perl 6 - links to (nearly) everything that is Perl 6.

Replies are listed 'Best First'.
Re^3: Extraction number from Text
by davido (Cardinal) on Jun 14, 2010 at 08:50 UTC

    Great point with respect to the '*' quantifier for the character class.

    The OP's example, which uses the '*' quantifier would, of course, allow NAN's to be parsed as numbers. For example: "--eeeeeeeeeee1" would be accepted as a number when it's definitely not (although perl could evaluate that string in numeric context giving it a value of 1). I didn't attempt to address that issue, but it goes to punctuate your next point which is.....

    Regex::Common is a nice resource too. If there's a resource that knows how to parse numbers, why write ones own number parser when it (a) takes more time, and (b) possibly introduces bugs? Regex::Common is the answer to both 'a' and 'b'.


    Dave

      For example: "--eeeeeeeeeee1" would be accepted as a number when it's definitely not (although perl could evaluate that string in numeric context giving it a value of 1).
      Eh, no. "--eeeeeeeeeee1" in numerical context is 0. When casting a string to a numeric value, Perl will always look at the beginning of a string. Something that starts with "--" cannot be a number; hence, it'll get the value 0.
      $ perl -wE 'say 0 + "--eeeeeeeeeee1"' Argument "--eeeeeeeeeee1" isn't numeric in addition (+) at -e line 1. 0 $
      Regex::Common is a nice resource too.
      s/Regex/Regexp/... ;-)