in reply to Re^3: Using Regexp::Common
in thread Using Regexp::Common

Hi Bill

in my requirement the example:

10,101,110.11010110
the numbers can be separated by , or space or tab or ; .

. is a decimal and that particular number should be considered a float. I do not have any requirement to extract out the exponent and the mantissa part.

Replies are listed 'Best First'.
Re^5: Using Regexp::Common
by AnomalousMonk (Archbishop) on Sep 20, 2015 at 16:52 UTC

    The problem I and apparently others had with your example was that if  , (comma) is used as both a number separator and as a three-digit, whole-number group separator, the string  '10,101,110.11010110' is ambiguous. Is it a single binary real 10,101,110.11010110, the whole-number portion of which has three groups? Is it a binary integer 10, followed by a binary real 101,110.11010110? Or is it a binary integer 10, followed by a binary integer 101, followed by a binary real 110.11010110? Or how about a grouped binary integer 10,101 followed by a binary real 110.11010110?

    Trying to use a comma as both a number separator and a group separator seems like a really bad idea. Trying to parse data in which a comma is used in this way seems like a nightmare.


    Give a man a fish:  <%-{-{-{-<

Re^5: Using Regexp::Common
by BillKSmith (Monsignor) on Sep 20, 2015 at 20:00 UTC
    The following program will match every base-10 number in your string. The number may be in any format known to perl.
    use strict; use warnings; use Regexp::Common; $_ = '10,101,110.11010110 '; local $, = "\n"; print /($RE{num}{real})/g;
    OUTPUT:
    10 101 110.11010110
    Note that the group and sep options are not specified. They only apply to separations within a single number. The absence of a keep option tells the module not to capture anything. We use our own parenthesis to capture exactly what we want. We use the /g match option to find (and capture) all the matches.
    Bill
      We use our own parenthesis ...

      With use of the  /g regex modifier, capturing parentheses are not needed (although they do no harm): in list context, all matched sub-strings are returned.

      c:\@Work\Perl>perl -wMstrict -MRegexp::Common -le "$_ = '10,101,110.11010110'; ;; print qq{'$_'} for /$RE{num}{real}/g; " '10' '101' '110.11010110'

      The absence of a keep option tells the module not to capture anything.

      But I thought that the idea behind the use of  -keep was that justrajdeep wanted to "... divide [the extracted numbers] into floats/integer etc.", i.e., classify them, and for this the capture of number components (sign, whole number, fractional part, etc.) by  -keep could be made to work nicely.

      ... group and sep options are not specified. They only apply to separations within a single number.

      justrajdeep seems to want to handle comma-separated, grouped whole numbers (and binary ones at that), but I agree that his or her requirements are a bit confusing, at least to me.


      Give a man a fish:  <%-{-{-{-<

        My intention was to help justrajdeep clarify his requirements by providing a base line solution to critique.

        Your use of 'for' neatly solves the problem I had with keep.

        I was not aware that my parenthesis were not needed. I probably still would have specified them.

        Bill