in reply to parsing code, finding block boundaries

As far as dealing with extracting code, both of your routines are broken; neither deal with end-curlies in quotes. For instance, both routines will bomb on:

sub blah { return '}'; }

You can use Text::Balanced::extract_codeblock($code, '{}'); instead of

extract_bracketed($code, '{}') to handle this, but even that will stil +l bomb on curlies in comments: </p> <code> sub blah { # } is such a neat character return '}'; }

If you want to handle this case, but don't want the complexity of writing a real parser, then you can try this hack instead:

$code = blockize($code); sub blockize { my ($code) = @_; # describe a block: use re 'eval'; my $comment = qr/#[^\n]*/; my $single_quoted = qr/' [^\\']+ (?: \\. [^\\']+ )* '/; my $double_quoted = qr/" [^\\"]+ (?: \\. [^\\"]+ )* "/; my $block; $block = qr/ { (?: (?> [^#'"{}]+ ) | $comment | $single_quoted | $double_quoted | (??{$block}) )* } /x; # now that we have a block definition set up, its just a simple su +bstitution $code =~ s/sub \s+ (\w+) \s+ ($block)/ format_code("$1", "$2") /ge +x; return $code; sub format_code { my ($name, $block) = @_; $block =~ s/\n/\n /g; return "{\n sub $name $block\n}"; } }

It will even format the added block nicely :) Hopefully your perl-like language doesn't have pod; that's a whole 'nother beast.

We could even clean that up a little bit, by using Regex::Common:

sub blockize { my ($code) = @_; # describe a block: use re 'eval'; use Regex::Common qw(delimited comment); my $block; $block = qr/ { (?: (?> [^#'"{}]+ ) | $RE{comment}{Perl} | $RE{delimited}{-delim=>"'"} | $RE{delimited}{-delim=>'"'} | (??{$block}) )* } /x; # now that we have a block definition set up, its just a simple su +bstitution $code =~ s/sub \s+ (\w+) \s+ ($block)/ format_code("$1", "$2") /ge +x; return $code; sub format_code { my ($name, $block) = @_; $block =~ s/\n/\n /g; return "{\n sub $name $block\n}"; } }

Replies are listed 'Best First'.
Re^2: parsing code, finding block boundaries
by revdiablo (Prior) on Aug 05, 2004 at 22:20 UTC

    I appreciate the response. I'll go over it in more detail later, but I should have mentioned earlier that comments and quotes are not a problem. They will be taken care of before this code runs.