in reply to Identifying and parsing numbers

The regexp I used was:

/^([+\-]?\d*(?:\.\d+)?)([a-z]{1,2})$/

Which is basically just a condensed version of Gilimanjaro's version, except instead of using:

([+\-]?(?:\d+(?:\.\d+)?)|(?:\.\d+))

to match the first portion of the input, I converted it to this:

([+\-]?\d*(?:\.\d+)?)

And my breakdown:

[+\-]?
We want to match a single positive or negative symbol either once or not at all

\d*
This extracts the whole number portion of the input, if the whole number portion is non-existant, that's account for by using * instead of + (zero-plus instead of one-plus)

(?:\.\d+)?
This extracts the decimal portion of the input. The (?:)? grouping allows the regexp to match a decimal followed by digits either once or not at all.

This regexp allows both the whole number portion and the decimal portion to be optional components, but prevents the need for alternative pattern matching.

Replies are listed 'Best First'.
Re: Re: Identifying and parsing numbers
by Anonymous Monk on Jan 16, 2003 at 23:34 UTC
    I think this regex will also give false positives, because it would now also match something like '+cm'.

    There is no way around the '|' I think, because the decimal part is what you might call optionally optional; it is only optional if there is a whole part of the number. Because of this, you can't always use the ? modifier to make the decimal part optional.

    The only other ways I could think of would use experimental regex features. Even zero-width look-behind or look-ahead assertions can't be aplied here I think...

    But I've finally been referenced by name on Perlmonks! Made my day!

    :)

      Though I was dumb enough to actually post will not being logged in... There go my votes... ;)
      Ah, yes, you are absolutely correct, I overlooked the fact that doing:

      (\d*(?:\.\d+)?)

      Would allow both the whole number and the decimal portions to match 'nothing at all' as I so aptly described in my breakdown.

      One more case of testing too many things that should work and not enough things that shouldn't.