in reply to Perl regex limitations

If I were trying to do this, I'd start by matching constructs that could contain braces that are not significant (quotes, comments). And I'd match piecemeal rather than trying to build one big, complex regex.

# load source code into $_ while( m{['"{}]|//|/*|\bclass\s+}g ) { my $c = substr( $_, pos($_)-1, 1 ); if( "'" eq $c ) { m{\G(?:[^\\']+|\\.)*'}gc or fail( \$_, "Unclosed char" ); } elsif( '"' eq $c ) { m{\G(?:[^\\"]+|\\.)*"}gc or fail( \$_, "Unclosed string" ); } elsif( '/' eq $c ) { m{\G[^\n]*}gc; } elsif( '*' eq $c ) { m{*/}gc or fail( \$_, "Unclosed comment" ); } elsif( '{' eq $c ) { $nest++; } elsif( '}' eq $c ) { $nest--; } elsif( 's' eq $c ) { my $p = pos(); m{\G\w+}gc or fail( \$_, "'class' not followed by ID" ); my $name = substr( $_, $p, pos()-$p ); if( m[\G\s*:\s*(?:\w+\s+)?(\w+)\s+{]gc ) { my $ancestor = $1; $nest++; } elsif( ! m[\s*{}gc ) { fail( \$_, "No { after 'class $name'" ); } } }

- tye        

Replies are listed 'Best First'.
Re^2: Perl regex limitations (m/\G.../gc)
by AppleFritter (Vicar) on Aug 06, 2014 at 17:03 UTC

    It may also be worth abandoning pure regular expressions in favor of a grammar-based parser. Parse::RecDescent comes to mind immediately, but there's also Regexp:Grammars if you'd like to stick to (spiced-up) regular expression after all.