in reply to Re^3: Regular expression
in thread Regular expression

Ohh my bad. I corrected G with \G but still no chnage
print "Weights are : $1 Kg\n" while $x=~/\G(\d+)\s*kg\s*/ig; #print -1

prints nothing. Still not sure what difference it was supposed to make. NO issues if you don't have much experience to share with this. I will try reading about it more.
Thanks for your help.

Replies are listed 'Best First'.
Re^5: Regular expression
by kcott (Archbishop) on Nov 01, 2017 at 03:08 UTC
    "print "Weights are : $1 Kg\n" while $x=~/\G(\d+)\s*kg\s*/ig; #print -1"

    The condition for the first while iteration is FALSE: the while loop does not iterate.

    Here's a blow-by-blow description of what's occuring. Bear in mind that character positions use a zero-based index: the first character in the string is at postion 0.

    • We start at position 0 in $x (that's character "1").
    • As no previous regex match with the 'g' modifier had occurred, the last match position is 0. The '\G' assertion is satisfied at the start of the string (i.e. position 0). That's a zero-width assertion, we stay at position 0.
    • (\d+) matches "1". This is temporarily assigned to $1 (i.e. $1 eq '1').
    • We move to position 1 in $x (that's character "" - a space).
    • \s* matches "".
    • We move to position 2 in $x (that's character "2").
    • The literal sequence "kg" does not match "2".
    • We move back to position 1 in $x and the regex engine backtracks to \s*.
    • \s* means zero or more spaces greedily. Last time one space was matched. We can also satisfy this by matching zero spaces: position stays at 1.
    • The literal sequence "kg" does not match "".
    • The temporary value in $1 is removed. The regex engine backtracks to \G looking for another way to find a match from the current position 1.
    • The last match position is 0; the current position is 1: the \G assertion is not satisfied.
    • We now move to postion 2 in $x: again, the \G assertion is not satisfied.
    • The regex engine moves along $x, one position at a time, attempting to find a match. Because none of these positions are 0, the \G assertion is never satisfied.
    • Eventually, after 139 steps, the regex engine runs out of string (i.e. the end of $x is reached) and the match is FALSE.

    I got all that information by running your code through Regexp::Debugger. I highly recommend this module: not only will you find bugs in your regex, you'll also learn a lot about them (in that respect, it's just as useful for regexes that work as those that don't).

    You can fix your current problem by adding '.*?' after the '\G':

    $ perle 'my $x = q{1 2 3kg 4 5 6 7 8 9 10Kg 11 12 13 kg 14 15}; say $1 + while $x =~ /\G.*?(\d+)\s*kg\s*/gi' 3 10 13
    " I will try reading about it more."

    The link I provided before does lead to more links. One in particular, which you should definitely read, is "perlop: Regexp Quote-Like Operators". You'll need to scroll down a fair way: look for the "\G assertion" section.

    — Ken

      Ahhh I understand it. Anomalous also replied on the same line. and I have a question whihc I put in my reply to him Question is about scope of \G. How long \G hold its value? Till next unsuccessful match?

Re^5: Regular expression
by AnomalousMonk (Archbishop) on Nov 01, 2017 at 01:20 UTC
    Ohh my bad. I corrected G with \G but still no chnage ... prints nothing. Still not sure what difference it was supposed to make.

    c:\@Work\Perl\monks>perl -wMstrict -le "my $x = '1 2 3kg 4 5 6 7 8 9 10Kg 11 12 13 kg 14 15'; print qq{string \$x: '$x'}; ;; printf qq{captured '$1' } while $x =~ /\G(\d+)\s*kg\s*/ig; print '----------'; " string $x: '1 2 3kg 4 5 6 7 8 9 10Kg 11 12 13 kg 14 15' ----------
    The  \G anchor matches at the point (the exact character offset in the string) at which matching stopped in the last  /g global match iteration. But on the first  /g iteration, where is that point? On the first  /g iteration,  \G matches the same as  \A (the \Absolute-start-of-string anchor).

    So what  /\G(\d+)\s*kg\s*/ig says is:

    1. \G From the string offset at which the previous match stopped (or from the start of the string if it's the first match);
    2. (\d+) Match and capture one or more decimal digits;
    3. \s* Then match zero or more whitespace characters;
    4. kg Then match the literal characters  'kg' case-insensitively (due to the  /i flag);
    5. \s* Then match zero or more whitespace characters (this can't fail);
    6. And this match iteration is finished.

    But your  '1 2 3kg 4 5 6 7 8 9 10Kg 11 12 13 kg 14 15' string begins with some digits, some whitespace, and then some more digits, not the required  'kg' literals: the match immediately fails. There is a  '3kg' subsequence further on that could satisfy part of the overall match, but matching has already failed due to the  \G assertion.


    Give a man a fish:  <%-{-{-{-<

      my $test_string= '12345'; print "$1\n" if ($test_string=~ /(2)/g); #Actually printed 2 print "$1\n" while ($test_string=~ /\G(\d)/g);# Printed every thing af +ter 2 i.e 3,4,5

      Thanks Anomalous for this explanation. If I understand you correctly then essentially you mean \G is a kind of anchor (like ^ and $) but instead of having a fixed location, its position depends on where last match happened. As in my code above first print statement printed only 2 and then for next print statement pattern match started in the string from location next to 2 so 3,4,5 matched.I hope my understanding is correct so far.

      Can you tell me what would be the scope of \G. Supposes in next print statement I use a different string, then also \G would will make pattern match to start from a position where it last matched in string one? what if second string is smaller than string in first print line and \G has a value greater than second string size?

        ... what would be the scope of \G. Supposes in next print statement I use a different string, then also \G would will make pattern match to start from a position where it last matched in string one?

        Each individual string has an independent "position of end of last successful match" attribute that is returned by the pos built-in. The  \G regex operator (enabled by the  /g modifier) accesses this attribute of a string being matched to assert that matching in that string is continuing where previous matching in that string by any  m//g match left off.

        c:\@Work\Perl\monks>perl -wMstrict -le "my $s1 = 'foobarfeefiefoefum'; $s1 =~ /foo/g; ;; my $s2 = '123456789'; $s2 =~ /6/g; ;; print qq{A: pos in \$s1 '$s1' after successful match == }, pos $s1; print qq{B: pos in \$s2 '$s2' after successful match == }, pos $s2; ;; $s1 =~ /foe/g; print qq{C: pos in \$s1 '$s1' after successful match == }, pos $s1; print qq{D: pos in \$s2 '$s2' still == }, pos $s2; " A: pos in $s1 'foobarfeefiefoefum' after successful match == 3 B: pos in $s2 '123456789' after successful match == 6 C: pos in $s1 'foobarfeefiefoefum' after successful match == 15 D: pos in $s2 '123456789' still == 6

        What would have happened if the second match against the  $s1 string had been  /\Gfoe/g instead? Or  /\Gbar/g instead? Try it and see! (See also the documentation concerning the effect of the  /c modifier in conjunction with  /g in a  m//gc match.)

        (Incidentally, what if  $test_string in your example code was  '12xxx345' and the match was  /(\d)/g (no \G) instead? What if it was  /\G(\d)/g as originally?)


        Give a man a fish:  <%-{-{-{-<