Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

What good is <CODE>\G</CODE> in a regular expression?

by faq_monk (Initiate)
on Oct 08, 1999 at 00:25 UTC ( [id://674]=perlfaq nodetype: print w/replies, xml ) Need Help??

Current Perl documentation can be found at perldoc.perl.org.

Here is our local, out-dated (pre-5.6) version:

The notation \G is used in a match or substitution in conjunction the /g modifier (and ignored if there's no /g) to anchor the regular expression to the point just past where the last match occurred, i.e. the pos() point.

For example, suppose you had a line of text quoted in standard mail and Usenet notation, (that is, with leading > characters), and you want change each leading > into a corresponding :. You could do so in this way:

     s/^(>+)/':' x length($1)/gem;

Or, using \G, the much simpler (and faster):

    s/\G>/:/g;

A more sophisticated use might involve a tokenizer. The following lex-like example is courtesy of Jeffrey Friedl. It did not work in 5.003 due to bugs in that release, but does work in 5.004 or better. (Note the use of /c, which prevents a failed match with /g from resetting the search position back to the beginning of the string.)

    while (<>) {
      chomp;
      PARSER: {
           m/ \G( \d+\b    )/gcx    && do { print "number: $1\n";  redo; };
           m/ \G( \w+      )/gcx    && do { print "word:   $1\n";  redo; };
           m/ \G( \s+      )/gcx    && do { print "space:  $1\n";  redo; };
           m/ \G( [^\w\d]+ )/gcx    && do { print "other:  $1\n";  redo; };
      }
    }

Of course, that could have been written as

    while (<>) {
      chomp;
      PARSER: {
           if ( /\G( \d+\b    )/gcx  {
                print "number: $1\n";
                redo PARSER;
           }
           if ( /\G( \w+      )/gcx  {
                print "word: $1\n";
                redo PARSER;
           }
           if ( /\G( \s+      )/gcx  {
                print "space: $1\n";
                redo PARSER;
           }
           if ( /\G( [^\w\d]+ )/gcx  {
                print "other: $1\n";
                redo PARSER;
           }
      }
    }

But then you lose the vertical alignment of the regular expressions.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2024-03-28 12:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found