I needed to reformat a bunch of Perl code, so I began writing this as an exercise to try and learn regular expressions better, but I've gotten stuck and I'm not sure why.

It's a very light weight perl parser that handles just a subset of perl.

Here's the code:

#!/usr/bin/perl -w use strict; my $bareword = qr/(\w+)/; my $quotelike = qr/((['"]).+?\2)/; my $subscript = qr/([\[{]\w+[\]\}])/; my $variable = qr/(\$\w+($subscript)*)/; my $sub_arg = qr/($quotelike|$variable|$bareword)/; my $sub_args = qr/($sub_arg,)*($sub_arg)/; my $subroutine = qr/((\w+::)*\w+\($sub_args)/; while (<>) { my @args = (); my @labels = (); my ($spacer, $obj, $method) = m/^(\s+)(\$\w+->)(\w+)\(/gc; LOOP: { push(@args, $1), redo LOOP if m/\G$quotelike,?\s*/gc; push(@args, $1), redo LOOP if m/\G$variable,?\s*/gc; push(@args, $1), redo LOOP if m/\G$subroutine,?\s*/gc; push(@args, $1), redo LOOP if m/\G$bareword,?\s*/gc; } @labels = ($method eq "hidden") ? qw(name value) : ##integer, text qw(name label value maxlength ex +tras subtext size uiLevel defaultValue hoverHelp) ; print join '', $spacer, $obj, $method, "(\n"; for(my $index=0; $index < @args; ++$index) { print join '', "\t\t-", $labels[$index], ' => ', $args[$index] +, ",\n"; } print join '', $spacer, ");\n"; }

This is the line I'm trying to parse:

$f->readOnly($session{form}{cid},WebGUI::International::get(469,"W +ebGUIProfile"));

And this is the formatted output, incorrect:

$f->readOnly( -name => $session{form}{cid}, -label => WebGUI::International::get(469, -value => "WebGUIProfile", );

The $sub_args regex is only matching the second set of parentheses, and I have no idea why. Can anyone clue me in?


In reply to Nesting regexen by colink

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.