Melly has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monkees,

Now, this actually does what I want:

foreach(@lines){ s/\G {8}/\t/g; }

It replaces multiples of 8 spaces with the corresponding number of tabs, but only at the beginning of the line. In other words, with a line like:

                X        Y
it only replaces the spaces between the start of the line and X, but leaves the spaces between the X and Y alone.

My question is, how?

I can only assume that the first time it encounters a line, \G is set to 0 - i.e. the beginning of the line. Is this the case?

Tom Melly, tom@tomandlu.co.uk

Replies are listed 'Best First'.
Re: Regex and \G
by broquaint (Abbot) on Oct 23, 2003 at 09:16 UTC
    Yes, \G is the same as \A in this context (i.e \G hasn't been set before) according to perlop (the last sentence is the most relevant sentence)
    You can intermix "m//g" matches with "m/\G.../g", where "\G" is a zero-width assertion that matches the exact position where the previous "m//g", if any, left off. Without the "/g" modifier, the "\G" assertion still anchors at pos(), but the match is of course only attempted once. Using "\G" without "/g" on a target string that has not previously had a "/g" match applied to it is the same as using the "\A" assertion to match the beginning of the string.
    HTH

    _________
    broquaint

Re: Regex and \G
by PodMaster (Abbot) on Oct 23, 2003 at 09:15 UTC
    Where did you read about \G? perlre says " \G Match only at pos() (e.g. at the end-of-match position of prior m//g)" and that's what happens. After the first two matches, pos points at the letter X, and X is not a space, so the match fails.
    my @lines = q[ X Y]; foreach(@lines){ s/\G {8}/warn pos();'G'/ge; } die qq['],@lines,q[']; __END__ 0 at - line 3. 8 at - line 3. 'GGX Y' at - line 5.
    I can only assume that the first time it encounters a line, \G is set to 0 - i.e. the beginning of the line. Is this the case?
    You don't have to assume, just use re 'debug'; and watch the output fly.

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

      Sorry, I should have been clearer - I understand what \G (and my regex) does EXCEPT on the first match.

      I guess matching against the beginning of the string was the only possible option, but I just wanted to double-check.

      Just as well I did, or I'd never have know about "use re 'debug'" - many thanks.

      Tom Melly, tom@tomandlu.co.uk
Re: Regex and \G
by Abigail-II (Bishop) on Oct 23, 2003 at 09:14 UTC
    I would use:
    s!^((?: {8})+)!"\t" x length ($1) / 8!e;

    Abigail

      Ah, but then I wouldn't know what I was doing... ;)

      Many thanks, though.

      Tom Melly, tom@tomandlu.co.uk
      s!^((?: {8})+)!"\t" x length ($1) / 8!e;

      Why would you want to replace leading chunks of spaces with "0" while generating a warning?

      Yet another example of why we should have a few more levels in the Perl precedence table.

                      - tye
        Why would you want to replace leading chunks of spaces with "0" while generating a warning?
        Because tabs are evil, and it's good that when there's an attempt to summon evil, warnings are generated, and harmless beings (the 0's) are send instead.

        Abigail

Re: Regex and \G
by bart (Canon) on Oct 24, 2003 at 01:02 UTC
    Allow me some ramblings... Try this code as an experiment:
    $a = "ab cdef gh ijklmn"; $b = "xyz"; while($a =~ /\G(\w\w)/g) { print "1: match in \$a: $1\n"; if($b =~ /\G(\w)/g) { print "2: match in \$b: $1\n"; } else { print "2: no match in \$b - replacing \$a\n"; $a = "AB CD EF"; } if($a =~ /\G(\s*)/g && length $1) { # The match itself never fails print "3: skipping whitespace in \$a\n"; } }
    It prints:
    1: match in $a: ab
    2: match in $b: x
    3: skipping whitespace in $a
    1: match in $a: cd
    2: match in $b: y
    1: match in $a: ef
    2: match in $b: z
    3: skipping whitespace in $a
    1: match in $a: gh
    2: no match in $b - replacing $a
    1: match in $a: AB
    2: match in $b: x
    3: skipping whitespace in $a
    1: match in $a: CD
    2: match in $b: y
    3: skipping whitespace in $a
    1: match in $a: EF
    2: match in $b: z
    
    Look how two regexes (1 and 3) match on $a, sharing the same pos pointer, and another one (2) matching on $b, using its own pointer, independently of the other.

    What can one conclude from this?

    • Each match starts at position 0, at first (re your question)
    • It continues from there; when a match fails, it resets the pointer to 0 (unless you used the /c modifier). If you try to match again, it'll start again from the top.
    • Each scalar has its own associated position pointer, not one per regex.
    • If you change the scalar midway, it resets its associated pointer.