in reply to Re^2: Regular expression
in thread Regular expression

Your test results:

print-1
In "/G(\d+)\s*kg\s*/ig" you're making exactly the same mistake as you did in your OP and which I pointed out in my initial response: "('G' should be '\G')". There is no 'G' to match!
print-2
In "/(\d+)\s*kg\s*/ig", you've used the 'g' modifier. It's repeatedly matching each number followed by the units (one after another). As you say: "this one does the job".
print-3
Your explaination of "/(\d+)\s*kg\s*/i" is correct.
"Please add some insight on what is the importance of \G and what are some practice usage scenario of \G?"

'\G' is not an assertion I use that much: I can't really give you an "I often find it useful for ..." type answer.

There's more information in "perlre: Assertions"; as well as links to additional, related documentation.

— Ken

Replies are listed 'Best First'.
Re^4: Regular expression
by pravakta (Novice) on Oct 31, 2017 at 23:15 UTC

    Ohh my bad. I corrected G with \G but still no chnage
    print "Weights are : $1 Kg\n" while $x=~/\G(\d+)\s*kg\s*/ig; #print -1

    prints nothing. Still not sure what difference it was supposed to make. NO issues if you don't have much experience to share with this. I will try reading about it more.
    Thanks for your help.

      "print "Weights are : $1 Kg\n" while $x=~/\G(\d+)\s*kg\s*/ig; #print -1"

      The condition for the first while iteration is FALSE: the while loop does not iterate.

      Here's a blow-by-blow description of what's occuring. Bear in mind that character positions use a zero-based index: the first character in the string is at postion 0.

      • We start at position 0 in $x (that's character "1").
      • As no previous regex match with the 'g' modifier had occurred, the last match position is 0. The '\G' assertion is satisfied at the start of the string (i.e. position 0). That's a zero-width assertion, we stay at position 0.
      • (\d+) matches "1". This is temporarily assigned to $1 (i.e. $1 eq '1').
      • We move to position 1 in $x (that's character "" - a space).
      • \s* matches "".
      • We move to position 2 in $x (that's character "2").
      • The literal sequence "kg" does not match "2".
      • We move back to position 1 in $x and the regex engine backtracks to \s*.
      • \s* means zero or more spaces greedily. Last time one space was matched. We can also satisfy this by matching zero spaces: position stays at 1.
      • The literal sequence "kg" does not match "".
      • The temporary value in $1 is removed. The regex engine backtracks to \G looking for another way to find a match from the current position 1.
      • The last match position is 0; the current position is 1: the \G assertion is not satisfied.
      • We now move to postion 2 in $x: again, the \G assertion is not satisfied.
      • The regex engine moves along $x, one position at a time, attempting to find a match. Because none of these positions are 0, the \G assertion is never satisfied.
      • Eventually, after 139 steps, the regex engine runs out of string (i.e. the end of $x is reached) and the match is FALSE.

      I got all that information by running your code through Regexp::Debugger. I highly recommend this module: not only will you find bugs in your regex, you'll also learn a lot about them (in that respect, it's just as useful for regexes that work as those that don't).

      You can fix your current problem by adding '.*?' after the '\G':

      $ perle 'my $x = q{1 2 3kg 4 5 6 7 8 9 10Kg 11 12 13 kg 14 15}; say $1 + while $x =~ /\G.*?(\d+)\s*kg\s*/gi' 3 10 13
      " I will try reading about it more."

      The link I provided before does lead to more links. One in particular, which you should definitely read, is "perlop: Regexp Quote-Like Operators". You'll need to scroll down a fair way: look for the "\G assertion" section.

      — Ken

        Ahhh I understand it. Anomalous also replied on the same line. and I have a question whihc I put in my reply to him Question is about scope of \G. How long \G hold its value? Till next unsuccessful match?

      Ohh my bad. I corrected G with \G but still no chnage ... prints nothing. Still not sure what difference it was supposed to make.

      c:\@Work\Perl\monks>perl -wMstrict -le "my $x = '1 2 3kg 4 5 6 7 8 9 10Kg 11 12 13 kg 14 15'; print qq{string \$x: '$x'}; ;; printf qq{captured '$1' } while $x =~ /\G(\d+)\s*kg\s*/ig; print '----------'; " string $x: '1 2 3kg 4 5 6 7 8 9 10Kg 11 12 13 kg 14 15' ----------
      The  \G anchor matches at the point (the exact character offset in the string) at which matching stopped in the last  /g global match iteration. But on the first  /g iteration, where is that point? On the first  /g iteration,  \G matches the same as  \A (the \Absolute-start-of-string anchor).

      So what  /\G(\d+)\s*kg\s*/ig says is:

      1. \G From the string offset at which the previous match stopped (or from the start of the string if it's the first match);
      2. (\d+) Match and capture one or more decimal digits;
      3. \s* Then match zero or more whitespace characters;
      4. kg Then match the literal characters  'kg' case-insensitively (due to the  /i flag);
      5. \s* Then match zero or more whitespace characters (this can't fail);
      6. And this match iteration is finished.

      But your  '1 2 3kg 4 5 6 7 8 9 10Kg 11 12 13 kg 14 15' string begins with some digits, some whitespace, and then some more digits, not the required  'kg' literals: the match immediately fails. There is a  '3kg' subsequence further on that could satisfy part of the overall match, but matching has already failed due to the  \G assertion.


      Give a man a fish:  <%-{-{-{-<

        my $test_string= '12345'; print "$1\n" if ($test_string=~ /(2)/g); #Actually printed 2 print "$1\n" while ($test_string=~ /\G(\d)/g);# Printed every thing af +ter 2 i.e 3,4,5

        Thanks Anomalous for this explanation. If I understand you correctly then essentially you mean \G is a kind of anchor (like ^ and $) but instead of having a fixed location, its position depends on where last match happened. As in my code above first print statement printed only 2 and then for next print statement pattern match started in the string from location next to 2 so 3,4,5 matched.I hope my understanding is correct so far.

        Can you tell me what would be the scope of \G. Supposes in next print statement I use a different string, then also \G would will make pattern match to start from a position where it last matched in string one? what if second string is smaller than string in first print line and \G has a value greater than second string size?