http://qs1969.pair.com?node_id=393419

muba has asked for the wisdom of the Perl Monks concerning the following question:

We all know (as I hope) one of the Perl slogans. The most famous, of course, is the Timtoady one. Next most famous is, I think, "Perl makes Easy Things Easy and Hard Things Possible". Well, with thise two slogans given, I'd like to add another one: "Only perl can parse Perl".

Great! So now I want to write a Perl script (interpreted by perl so hopefully able to parse Perl, because hard things should be possible, in more than one way) to extract subroutines from another Perl script.

Globally, this would be:
  • search for sub NAME [(PROTOTYPE)] [: ATTRIBUTES]
  • search for the opening curly, until the matching closing curly.

    Well, this shouldn't be too hard. But mind you! What if there are closing curlies within strings? Of course it is not too hard just to ignore everything between quotes. But what if something like qq() or qw() is used? What if the fancy => operator is used? What if here docs are used? And so on...

    So... the main question is: how can I extract a subroutine from a Perl file, beginning with the sub keyword, then the name, prototype and attribute specifications, then the opening curly and from there, everything until the closing curly?
    I would be glad if this can be done using regexes, but I don't think they're up to the job (unless they become really, really complex). Another possibility is just to scan byte-by-byte, keeping track of opened and closed curly brackets and opened and closed string (this isn't easy, for there are many types of strings, as mentioned above).

    To make a long story short, is there an easy way (module or whatever) to easily extract subroutines from a Perl script using a Perl script?




    "2b"||!"2b";$$_="the question"
    Besides that, my code is untested unless stated otherwise.
    One more: please review the article about regular expressions (do's and don'ts) I'm working on.