in reply to Regex critique
Depending on how much you can constrain your input, you could look at the regex I use in Filter::signatures to recognize subroutine signatures. The regexes somewhat cheat as they leave parsing of quotes and quote-likes to Filter::Simple / Text::Balanced, so you either need to use these modules or restrict your code to sane / easy string constructs.
The most simple version is this. It is compatible with 5.8, but doesn't handle nested parentheses.
# This is the version that is most downwards compatible but doesn't ha +ndle # parentheses in default assignments sub transform_arguments { # This should also support # sub foo($x,$y,@) { ... }, throwing away additional arguments # Named or anonymous subs no warnings 'uninitialized'; s{\bsub(\s*)(\w*)(\s*)\((\s*)((?:[^)]*?\@?))(\s*)\)(\s*)\{}{ parse_argument_list("$2","$5","$1$3$4$6$7") }mge; $_ }
This fails for example for:
sub fail58( $time = localtime() ) {
The recursive regex used for 5.010 onwards is more complex because it handles matched parentheses and curly braces.
sub transform_arguments { # We also want to handle arbitrarily deeply nested balanced parent +heses here no warnings 'uninitialized'; s{\bsub(\s*) #1 (\w*) #2 (\s*) #3 \( (\s*) #4 ( #5 ( #6 (?: \\. # regex escapes and references | \( (?6)? # recurse for parentheses \) | \{ (?6)? # recurse for curly brackets \} | (?>[^\\\(\)\{\}]+) # other stuff )+ )* \@? # optional slurpy discard argument + at the end ) (\s*)\) (\s*)\{}{ parse_argument_list("$2","$5","$1$3$4$8$9") }mgex; $_ }
If this is for parsing your own code, you usually can rewrite/restrict your code to a limited subset of what Perl allows. If you want a generic "subroutine declaration finder", you have a hard task in front of you.
|
|---|