in reply to Re: Using Regexp::Common
in thread Using Regexp::Common

Hi Bill

I wanted to extract all the numbers from a string that may be separated by any delimiter. Then divide them into floats/integer etc. I thought Regexp::Common would be simple to use. But looks like it is not so :(

Replies are listed 'Best First'.
Re^3: Using Regexp::Common
by BillKSmith (Monsignor) on Sep 19, 2015 at 16:50 UTC
    I do not know what you mean by 'number' in your example. Is the period a decimal point or a separator? Do the commas separate numbers or do they separate fields within a number to make them easier to read? Are your numbers binary numbers or are they decimal numbers that just happen to consist of only ones and zeros? Do you want to parse the string more than once, using different criteria? Lets get you example working. We can generalize later. Please tell us exactly what results you expect from your single example.
    Bill

      Hi Bill

      in my requirement the example:

      10,101,110.11010110
      the numbers can be separated by , or space or tab or ; .

      . is a decimal and that particular number should be considered a float. I do not have any requirement to extract out the exponent and the mantissa part.

        The problem I and apparently others had with your example was that if  , (comma) is used as both a number separator and as a three-digit, whole-number group separator, the string  '10,101,110.11010110' is ambiguous. Is it a single binary real 10,101,110.11010110, the whole-number portion of which has three groups? Is it a binary integer 10, followed by a binary real 101,110.11010110? Or is it a binary integer 10, followed by a binary integer 101, followed by a binary real 110.11010110? Or how about a grouped binary integer 10,101 followed by a binary real 110.11010110?

        Trying to use a comma as both a number separator and a group separator seems like a really bad idea. Trying to parse data in which a comma is used in this way seems like a nightmare.


        Give a man a fish:  <%-{-{-{-<

        The following program will match every base-10 number in your string. The number may be in any format known to perl.
        use strict; use warnings; use Regexp::Common; $_ = '10,101,110.11010110 '; local $, = "\n"; print /($RE{num}{real})/g;
        OUTPUT:
        10 101 110.11010110
        Note that the group and sep options are not specified. They only apply to separations within a single number. The absence of a keep option tells the module not to capture anything. We use our own parenthesis to capture exactly what we want. We use the /g match option to find (and capture) all the matches.
        Bill