in reply to Re^2: Using Regexp::Common
in thread Using Regexp::Common

I do not know what you mean by 'number' in your example. Is the period a decimal point or a separator? Do the commas separate numbers or do they separate fields within a number to make them easier to read? Are your numbers binary numbers or are they decimal numbers that just happen to consist of only ones and zeros? Do you want to parse the string more than once, using different criteria? Lets get you example working. We can generalize later. Please tell us exactly what results you expect from your single example.
Bill

Replies are listed 'Best First'.
Re^4: Using Regexp::Common
by justrajdeep (Novice) on Sep 20, 2015 at 11:31 UTC

    Hi Bill

    in my requirement the example:

    10,101,110.11010110
    the numbers can be separated by , or space or tab or ; .

    . is a decimal and that particular number should be considered a float. I do not have any requirement to extract out the exponent and the mantissa part.

      The problem I and apparently others had with your example was that if  , (comma) is used as both a number separator and as a three-digit, whole-number group separator, the string  '10,101,110.11010110' is ambiguous. Is it a single binary real 10,101,110.11010110, the whole-number portion of which has three groups? Is it a binary integer 10, followed by a binary real 101,110.11010110? Or is it a binary integer 10, followed by a binary integer 101, followed by a binary real 110.11010110? Or how about a grouped binary integer 10,101 followed by a binary real 110.11010110?

      Trying to use a comma as both a number separator and a group separator seems like a really bad idea. Trying to parse data in which a comma is used in this way seems like a nightmare.


      Give a man a fish:  <%-{-{-{-<

      The following program will match every base-10 number in your string. The number may be in any format known to perl.
      use strict; use warnings; use Regexp::Common; $_ = '10,101,110.11010110 '; local $, = "\n"; print /($RE{num}{real})/g;
      OUTPUT:
      10 101 110.11010110
      Note that the group and sep options are not specified. They only apply to separations within a single number. The absence of a keep option tells the module not to capture anything. We use our own parenthesis to capture exactly what we want. We use the /g match option to find (and capture) all the matches.
      Bill
        We use our own parenthesis ...

        With use of the  /g regex modifier, capturing parentheses are not needed (although they do no harm): in list context, all matched sub-strings are returned.

        c:\@Work\Perl>perl -wMstrict -MRegexp::Common -le "$_ = '10,101,110.11010110'; ;; print qq{'$_'} for /$RE{num}{real}/g; " '10' '101' '110.11010110'

        The absence of a keep option tells the module not to capture anything.

        But I thought that the idea behind the use of  -keep was that justrajdeep wanted to "... divide [the extracted numbers] into floats/integer etc.", i.e., classify them, and for this the capture of number components (sign, whole number, fractional part, etc.) by  -keep could be made to work nicely.

        ... group and sep options are not specified. They only apply to separations within a single number.

        justrajdeep seems to want to handle comma-separated, grouped whole numbers (and binary ones at that), but I agree that his or her requirements are a bit confusing, at least to me.


        Give a man a fish:  <%-{-{-{-<