Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
A recent thread (need regex help to strip things like embedded C comments) discussed the use of regexes to extract nested ``bracketed'' patterns such as nested C block comments (if such a thing existed in C today; some pre-ANSI-standard implementations supported this feature).
The discussion of the (??{ code }) extended pattern in perlre gives an example of such a regex for extracting nested parenthetic pairs:
$re = qr{ \( (?: (?> [^()]+ ) # Non-parens without backtracking | (??{ $re }) # Group with matching parens )* \) }x;
This example can be extended to handle arbitrary multi-character starting and ending sequences like /* and */.
The perlre example uses the non-backtracking, ``atomic'' extended pattern (?>pattern), but the example seems to work just as well without it for both single- and multi-character starting and ending sequences, as in the following code...
use warnings; use strict; my $open_cmt = qr{\Q/*}xms; # NO SPACES: \Q escapes spaces my $close_cmt = qr{\Q*/}xms; use re 'eval'; our $paired_parens = # CAUTION: MUST be package variable!!!! qr{ # adapted from example (??{ code }) regex from perlre \( (?: # (?> [^()]+ ) # Non-parens without backtracking - works [^()]+ # Non-parens with backtracking - works | (??{ $paired_parens }) # Group with matching parens )* # \) # ignore un-paired paren (?: \) | \z ) # grab un-paired paren to end of string }xms; our $c_comment = # CAUTION: MUST be package variable!!!! qr{ $open_cmt (?: # (?> (?: (?! $open_cmt) (?! $close_cmt) . )+ ) # works # (?> (?: (?! $open_cmt | $close_cmt ) . )+ ) # works (?: (?! $open_cmt | $close_cmt ) . )+ # works | (??{ $c_comment }) # nested comment )* $close_cmt # ignore improperly closed comment # (?: $close_cmt | \z ) # grab un-closed comment to string end }xms; my $result; my $parens = "degenerate examples () (((()))) ((((())))) (simple) parens (nested(with)other) stuff multi-line ( nested (parens () (non-empty) (sequential) ( (multi-line) (sequential) ( foo ((())) ) bar ) ) ) improperly ( paired ( parens )"; ($result = $parens) =~ s{ ($paired_parens) } { # print "captured: <$1> \n"; # FOR DEBUG "PAIR:$1:RIAP"; }egxms; print "$result \n"; my $comments = "/* simple comment on its own line */ various degenerate comments /**/ /*/*/*/*/*/**/*/*/*/*/*/ simplest multi-level comment /*/*/*/*/*/**/*/*/*/*/*/ with other stuff simplest seven-deep /*/*/*/*/*/*/**/*/*/*/*/*/*/ comment two /* sequential */ comments /* on a line */ with other stuff two-deep /* nested /* comments */ on a single */ line three-deep /* nested /* comments /* (level 3) */ */ on single */ line five-deep /* multi-line comment /* with ********* /* sequential *********** /*************** /* comments */ /* near */ /* lowest */ /* level */ /* on */ /* multiple */ /* lines /* (and a fifth level) */ */ */ finish four-deep ****** */ finish three-deep ******* */ finish two-deep */ end complex nested multi-line comment improperly /* nested /* comment */"; ($result = $comments) =~ s{ ($c_comment) } { # print "captured: <$1> \n"; # FOR DEBUG "PAIR:$1:RIAP"; }egxms; print "$result \n";
My question: What is the reason, if any, for using the atomic sub-expression in the original perlre example?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Useless use of `atomic' regex extended pattern?
by moritz (Cardinal) on Jul 26, 2007 at 09:06 UTC | |
|
Re: Useless use of `atomic' regex extended pattern?
by ikegami (Patriarch) on Jul 26, 2007 at 15:17 UTC |