darisler has asked for the wisdom of the Perl Monks concerning the following question:

Hello kind Monks,

I am trying code a regular expression which will capture an "unsigned fixed decimal" value, preserving the original formatting. Here is what I'm trying to do:

my $decimal = qr(\d*\.?\d+); $_ = "0.12"; if ( m{(?<value> $decimal)}x ) { say "$_ $+{value}"; } $_ = ".12"; if ( m{(?<value> $decimal)}x ) { say "$_ $+{value}"; } $_ = "12."; if ( m{(?<value> $decimal)}x ) { say "$_ $+{value}"; } $_ = "12"; if ( m{(?<value> $decimal)}x ) { say "$_ $+{value}"; }
When I run this I get the following:
0.12 0.12 .12 .12 12. 12 12 12

But the third line, I am not preserving the trailing decimal place, although the values should be equal.

Can anyone point out how I could change my "qr" so that it "works" for the third line?

  • Comment on regex to capture an unsigned decimal value, but also preserving the user's formatting.
  • Select or Download Code

Replies are listed 'Best First'.
Re: regex to capture an unsigned decimal value, but also preserving the user's formatting.
by davido (Cardinal) on May 04, 2016 at 05:57 UTC

    You can make the dot possessive and the trailing digits greedy but optional:

    while(<DATA>){ chomp; next unless length; say "$_ => [", (m/(\d*\.?+\d*)/x ? "$1]" : ']'); } __DATA__ 0.12 .12 12. 12

    This yields the following output:

    0.12 => [0.12] .12 => [.12] 12. => [12.] 12 => [12]

    I do not know what additional edge cases you might encounter that would break this. But for your test cases it works.

    Two tricks here. First, it was a mistake to use the + quantifier on the trailing \d, because then it became a required portion of the match. If it's required, then 12. is going to fail to match the decimal point because it comes before the trailing \d+. By making the quantifier * we stay greedy, but optional.

    The next trick is the ?+ quantifier for the decimal point. The + here signifies to be possessive -- once it has matched, hold onto what it matched against, and don't give it up during backtracking.

    For reference, here are the two regexes (yours and mine) in close proximity. You can disregard the parens here; I'm only using them for capturing, which you were achieving by referring to the match hash.

    m/( \d* \.?+ \d* )/x # Mine m/( \d* \.? \d+ )/x # Yours

    I suggest walking through your original regular expression, and the one I've provided using the Regexp::Debugger's rxrx utility. I think my inadequate description will be clearer once you see the wheels in motion.

    Update: I've fixed the backslashing of the . in the pattern.


    Dave

      Hello davido,

      Your use of the + (possessive) quantifier is ingenious, as it leads to simple, elegant code. I think you need to backslash the dot to avoid matching any character:

      say "$_ => [", (m/(\d*\.?+\d*)/x ? "$1]" : ']'); # ^

      But my main quibble is that this doesn’t work consistently if the decimal is embedded in a longer string. (Whether that’s actually a requirement isn’t clear from the OP.) For that, I came up with the following regex, which is verbose but seems to work OK:

      #! perl use strict; use warnings; use feature qw( say ); my $decimal = qr{ ( (?: \d+ \.? \d* ) | (?: \d* \.? \d+ ) ) }x; while (<DATA>) { chomp; next unless length; say "$_ => [", (/$decimal/x ? "$1]" : ']'); } __DATA__ 0.12 .12 12. 12 no numbers here abc.def42 .7zx

      Output:

      16:39 >perl 1621_SoPW.pl 0.12 => [0.12] .12 => [.12] 12. => [12.] 12 => [12] no numbers here => [] abc.def42 => [42] .7zx => [.7] 16:39 >

      Cheers,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        I think the other important point to make is that the order of the sub-patterns in the ordered alternation is critical:

        c:\@Work\Perl\monks>perl -wMstrict -le "my $ufp = qr{ \d* [.]? \d+ | \d+ [.]? \d* }xms; ;; for my $s (qw(0.12 .34 56. 78)) { printf qq{'$s' -> }; printf qq{'$1' \n} if $s =~ m{ ($ufp) }xms; } ;; my $s = '0.12 -0.98 .34 -.76 56. -54. 78 -32 bla bla abc.def42 .7zx'; printf qq{'$1' } while $s =~ m{ ($ufp) }xmsg; " '0.12' -> '0.12' '.34' -> '.34' '56.' -> '56' '78' -> '78' '0.12' '0.98' '.34' '.76' '56' '54' '78' '32' '42' '.7'
        (See  '56.' and  '-54.' instances.)


        Give a man a fish:  <%-{-{-{-<