knobbled has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to extract subroutines in .java files that always export the same name - public void readExternal. Obviously one can use this as starting point in a 'range', however, the only way to end the search is to use the ultimate closeing brace. Perhaps using increment and decrement on 1st open and last close braces?

Replies are listed 'Best First'.
Re: Extract lines of .java
by IlyaM (Parson) on Jan 02, 2003 at 18:22 UTC
    The easiest solution is probably using module Text::Balanced. Untested code:
    use Text::Balanced qw(extract_codeblock); # get source code in this variable my $source = ....; # regexp finds where method is defined and sets "pos" (see perldoc -f +pos), # than extract_codeblock extracts following codeblock if($source =~ /public \s+ void \s+ readExternal \s+ \( .*? \) \s+ /gx) + { my $sub_source = extract_codeblock($source); print $sub_source; }

    --
    Ilya Martynov, ilya@iponweb.net
    CTO IPonWEB (UK) Ltd
    Quality Perl Programming and Unix Support UK managed @ offshore prices - http://www.iponweb.net
    Personal website - http://martynov.org

Re: Extract lines of .java
by fruiture (Curate) on Jan 02, 2003 at 18:29 UTC

    Although counting curlies might succeed quite foten, it will fail if you somehwre have a string constant containing a an unbalanced curly or acomment.

    public void readExternal () { ... '}' /* this } is NOT the end as well */ }

    So it seems you can't get around a basic tokenizer that can differenciate code curlies from string-constant curlies. At least Java syntax is not as hard to parse as Perl, the only way to create a place where a curly does not close a block is in "" and '', after // and between /* and */, there's no qq,q,qr,qw or s///,m//,tr/// ....

    A quick solution (demonstrating only basic functionaility):

    use Regexp::Common; local $_ = join '',<>; my $code = ''; my $curlies = 1; #one curly open while( m# \G ( [^{}'"/]* ) #xg ){ $code .= $1; my $p = pos; # '' + "" if( m# \G ( $RE{quoted} ) #xg ){ $code .= $1; next; } pos = $p; # /* */ if( m# \G ( $RE{balanced}{-begin=>'/*'}{-end=>'*/'} ) #xg ){ $code .= $1; next; } pos = $p; #// if( m# \G ( // [^\n]* ) #xg ){ $code .= $1; next; } pos = $p; # { if( m# \G \{ #xg ){ $code .= '{'; ++ $curlies; next; } pos = $p; # } if( m# \G \} #xg ){ $code .= '}'; -- $curlies; last unless $curlies; next; } pos = $p; m# \G ( . ) #sxg or last; $code .= $1; } print "CODE\n$code\n";
    --
    http://fruiture.de
Re: Extract lines of .java
by Juerd (Abbot) on Jan 02, 2003 at 18:17 UTC

    he only way to end the search is to use the ultimate closeing brace. Perhaps using increment and decrement on 1st open and last close braces?

    You could use a recursive regex. Easiest way to get one is Regexp::Common.

    use Regexp::Common; $_ = q{ public void readExternal() { this { is } not { { real } java } code but an { { { example } } } } }; my ($function) = /( public \s+ void \s+ readExternal [^{]+ $RE{balanced}{-parens=>' +{}'} )/x;

    - Yes, I reinvent wheels.
    - Spam: Visit eurotraQ.