comment on

We all know (as I hope) one of the Perl slogans. The most famous, of course, is the Timtoady one. Next most famous is, I think, "Perl makes Easy Things Easy and Hard Things Possible". Well, with thise two slogans given, I'd like to add another one: "Only perl can parse Perl".

Great! So now I want to write a Perl script (interpreted by perl so hopefully able to parse Perl, because hard things should be possible, in more than one way) to extract subroutines from another Perl script.

Globally, this would be:

search for sub NAME [(PROTOTYPE)] [: ATTRIBUTES]

search for the opening curly, until the matching closing curly.

Well, this shouldn't be too hard. But mind you! What if there are closing curlies within strings? Of course it is not too hard just to ignore everything between quotes. But what if something like qq() or qw() is used? What if the fancy => operator is used? What if here docs are used? And so on...

So... the main question is: how can I extract a subroutine from a Perl file, beginning with the sub keyword, then the name, prototype and attribute specifications, then the opening curly and from there, everything until the closing curly?
I would be glad if this can be done using regexes, but I don't think they're up to the job (unless they become really, really complex). Another possibility is just to scan byte-by-byte, keeping track of opened and closed curly brackets and opened and closed string (this isn't easy, for there are many types of strings, as mentioned above).

To make a long story short, is there an easy way (module or whatever) to easily extract subroutines from a Perl script using a Perl script?

"2b"||!"2b";$$_="the question"
Besides that, my code is untested unless stated otherwise.
One more: please review the article about regular expressions (do's and don'ts) I'm working on.

In reply to Extract subroutines from a Perl script. OR: "Only perl can parse Perl." But can it help me to do so? by muba

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


P is for Practical
	PerlMonks