comment on

Regular expressions are not parsers.

A regular expression can refer to itself, but that is a kludgy hack. To do that use the ??{ } form inside a regex.

There are better solutions though.

If you want to clean up your perl code, perltidy helps. If you want to do custom transformations, PPI is well suited for the job.

Larry Wall is working on the perl 5 to perl 6 converter, and he's using the parsing code from perl 5 itself to emit a canonical format... When he makes a release that might be useful.

If you want to parse it on your own, try looking at Parse::RecDescent.

Back to your problem - since perl's grammar is recursive: (... ( sub expression ) ...) you have to keep track of parenthesis balancing. You need to use some kind of stack structure (an explicit one or the call stack) to nibble paren tokens, and construct nested subexpressions. Once you can build that much you need to reserialize your structures back. Since you are only going two levels deep, this will be a problem soon.

You can use regexes to find tokens, but the notion of state must be maintained, and this is not taken into account in your code.

Your subroutine regex is the problem there, btw. It doesn't match a closing paren, so the collected string ends in "WebGUIProfile", and stops there. A hard coded paren for the method call is printed, and that's where it ends.

-nuffin
zz zZ Z Z #!perl

In reply to Re: Nesting regexen by nothingmuch
in thread Nesting regexen by colink

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Keep It Simple, Stupid
	PerlMonks