in reply to Re: Regular expression
in thread Regular expression

Hi Ken,
A big thanks to you for your detailed analysis and patience to explain things. You guessed it right I am relatively new to ‘serious perl’ learning and have been experimenting with the language. Now coming back o the problem. I understood points you made. You are right that primary motive of my code snippet was to understand the modifiers \g and \G. to better apprciate the effect of \g ad \G I wrote a sample code as follows-

my $x= '1 2 3kg 4 5 6 7 8 9 10Kg 11 12 13 kg 14 15'; chomp $x; print "Values in variabe x are : \n$x\n"; $x =~ m/(?<weight>\d+)\s*/g; my $weight = $1; print "Matched pattern is : $+{weight}\n"; print "Values in variabe x are : \n$x\n"; #print "Weights are : $1 Kg\n" while $x=~/G(\d+)\s*kg\s*/ig; #print - +1 #print "Weights are : $1 Kg\n" while $x=~/(\d+)\s*kg\s*/ig; #print -2 print "Weights are : $1 Kg\n" while $x=~/(\d+)\s*kg\s*/i; #print -3

My expected result is<\p>
Weights are 3kg
Weights are 10kg
Weights are 13kg
I have used three print statements #print-1/2/3. My observations-
print-1-> I was expecting this to be the right statement for my output requirement. But enabling this doesn't seem to have any thing matching. No print.
print-2-> this one does the job. As I understand its kind of global match and with every iteration of the loop it start looking in the string from a point where it matched last.
print-3-> goes in a infinite loop whihc I understand due the fact that every iteration of the loop start looking from the start of the string and it always find 3kg there. So it only print 3kg in infinite loop.
Please add some insight on what is the importance of \G and what are some practice usage scenario of \G?

Replies are listed 'Best First'.
Re^3: Regular expression
by kcott (Archbishop) on Oct 31, 2017 at 22:37 UTC

    Your test results:

    print-1
    In "/G(\d+)\s*kg\s*/ig" you're making exactly the same mistake as you did in your OP and which I pointed out in my initial response: "('G' should be '\G')". There is no 'G' to match!
    print-2
    In "/(\d+)\s*kg\s*/ig", you've used the 'g' modifier. It's repeatedly matching each number followed by the units (one after another). As you say: "this one does the job".
    print-3
    Your explaination of "/(\d+)\s*kg\s*/i" is correct.
    "Please add some insight on what is the importance of \G and what are some practice usage scenario of \G?"

    '\G' is not an assertion I use that much: I can't really give you an "I often find it useful for ..." type answer.

    There's more information in "perlre: Assertions"; as well as links to additional, related documentation.

    — Ken

      Ohh my bad. I corrected G with \G but still no chnage
      print "Weights are : $1 Kg\n" while $x=~/\G(\d+)\s*kg\s*/ig; #print -1

      prints nothing. Still not sure what difference it was supposed to make. NO issues if you don't have much experience to share with this. I will try reading about it more.
      Thanks for your help.

        "print "Weights are : $1 Kg\n" while $x=~/\G(\d+)\s*kg\s*/ig; #print -1"

        The condition for the first while iteration is FALSE: the while loop does not iterate.

        Here's a blow-by-blow description of what's occuring. Bear in mind that character positions use a zero-based index: the first character in the string is at postion 0.

        • We start at position 0 in $x (that's character "1").
        • As no previous regex match with the 'g' modifier had occurred, the last match position is 0. The '\G' assertion is satisfied at the start of the string (i.e. position 0). That's a zero-width assertion, we stay at position 0.
        • (\d+) matches "1". This is temporarily assigned to $1 (i.e. $1 eq '1').
        • We move to position 1 in $x (that's character "" - a space).
        • \s* matches "".
        • We move to position 2 in $x (that's character "2").
        • The literal sequence "kg" does not match "2".
        • We move back to position 1 in $x and the regex engine backtracks to \s*.
        • \s* means zero or more spaces greedily. Last time one space was matched. We can also satisfy this by matching zero spaces: position stays at 1.
        • The literal sequence "kg" does not match "".
        • The temporary value in $1 is removed. The regex engine backtracks to \G looking for another way to find a match from the current position 1.
        • The last match position is 0; the current position is 1: the \G assertion is not satisfied.
        • We now move to postion 2 in $x: again, the \G assertion is not satisfied.
        • The regex engine moves along $x, one position at a time, attempting to find a match. Because none of these positions are 0, the \G assertion is never satisfied.
        • Eventually, after 139 steps, the regex engine runs out of string (i.e. the end of $x is reached) and the match is FALSE.

        I got all that information by running your code through Regexp::Debugger. I highly recommend this module: not only will you find bugs in your regex, you'll also learn a lot about them (in that respect, it's just as useful for regexes that work as those that don't).

        You can fix your current problem by adding '.*?' after the '\G':

        $ perle 'my $x = q{1 2 3kg 4 5 6 7 8 9 10Kg 11 12 13 kg 14 15}; say $1 + while $x =~ /\G.*?(\d+)\s*kg\s*/gi' 3 10 13
        " I will try reading about it more."

        The link I provided before does lead to more links. One in particular, which you should definitely read, is "perlop: Regexp Quote-Like Operators". You'll need to scroll down a fair way: look for the "\G assertion" section.

        — Ken

        Ohh my bad. I corrected G with \G but still no chnage ... prints nothing. Still not sure what difference it was supposed to make.

        c:\@Work\Perl\monks>perl -wMstrict -le "my $x = '1 2 3kg 4 5 6 7 8 9 10Kg 11 12 13 kg 14 15'; print qq{string \$x: '$x'}; ;; printf qq{captured '$1' } while $x =~ /\G(\d+)\s*kg\s*/ig; print '----------'; " string $x: '1 2 3kg 4 5 6 7 8 9 10Kg 11 12 13 kg 14 15' ----------
        The  \G anchor matches at the point (the exact character offset in the string) at which matching stopped in the last  /g global match iteration. But on the first  /g iteration, where is that point? On the first  /g iteration,  \G matches the same as  \A (the \Absolute-start-of-string anchor).

        So what  /\G(\d+)\s*kg\s*/ig says is:

        1. \G From the string offset at which the previous match stopped (or from the start of the string if it's the first match);
        2. (\d+) Match and capture one or more decimal digits;
        3. \s* Then match zero or more whitespace characters;
        4. kg Then match the literal characters  'kg' case-insensitively (due to the  /i flag);
        5. \s* Then match zero or more whitespace characters (this can't fail);
        6. And this match iteration is finished.

        But your  '1 2 3kg 4 5 6 7 8 9 10Kg 11 12 13 kg 14 15' string begins with some digits, some whitespace, and then some more digits, not the required  'kg' literals: the match immediately fails. There is a  '3kg' subsequence further on that could satisfy part of the overall match, but matching has already failed due to the  \G assertion.


        Give a man a fish:  <%-{-{-{-<