in reply to parsing code, finding block boundaries
As far as dealing with extracting code, both of your routines are broken; neither deal with end-curlies in quotes. For instance, both routines will bomb on:
sub blah { return '}'; }
You can use Text::Balanced::extract_codeblock($code, '{}'); instead of
extract_bracketed($code, '{}') to handle this, but even that will stil +l bomb on curlies in comments: </p> <code> sub blah { # } is such a neat character return '}'; }
If you want to handle this case, but don't want the complexity of writing a real parser, then you can try this hack instead:
$code = blockize($code); sub blockize { my ($code) = @_; # describe a block: use re 'eval'; my $comment = qr/#[^\n]*/; my $single_quoted = qr/' [^\\']+ (?: \\. [^\\']+ )* '/; my $double_quoted = qr/" [^\\"]+ (?: \\. [^\\"]+ )* "/; my $block; $block = qr/ { (?: (?> [^#'"{}]+ ) | $comment | $single_quoted | $double_quoted | (??{$block}) )* } /x; # now that we have a block definition set up, its just a simple su +bstitution $code =~ s/sub \s+ (\w+) \s+ ($block)/ format_code("$1", "$2") /ge +x; return $code; sub format_code { my ($name, $block) = @_; $block =~ s/\n/\n /g; return "{\n sub $name $block\n}"; } }
It will even format the added block nicely :) Hopefully your perl-like language doesn't have pod; that's a whole 'nother beast.
We could even clean that up a little bit, by using Regex::Common:
sub blockize { my ($code) = @_; # describe a block: use re 'eval'; use Regex::Common qw(delimited comment); my $block; $block = qr/ { (?: (?> [^#'"{}]+ ) | $RE{comment}{Perl} | $RE{delimited}{-delim=>"'"} | $RE{delimited}{-delim=>'"'} | (??{$block}) )* } /x; # now that we have a block definition set up, its just a simple su +bstitution $code =~ s/sub \s+ (\w+) \s+ ($block)/ format_code("$1", "$2") /ge +x; return $code; sub format_code { my ($name, $block) = @_; $block =~ s/\n/\n /g; return "{\n sub $name $block\n}"; } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: parsing code, finding block boundaries
by revdiablo (Prior) on Aug 05, 2004 at 22:20 UTC |