Speeding up RecDesent parser for Perl Code

Outaspace has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow Monks of the Perl wisdom,

I have written a grammar for parsing a Perl Code file for packages, subs and POD sections. But it is still slow, if I try to parse large (>3000 lines) files. Does anyone has a idea how I can speed it up a little.


# Tokens
word    : /[a-zA-Z_:][a-zA-Z0-9_:]*/

quotelike    : <perl_quotelike> | ( 'qw' '(' word(s?) ')' )

# Start of Perl-File
startrule :    { $thisparser->StartParsing(); }
        block_construct(s)
        { $thisparser->EndParsing($text); }

block_construct :    sub_definition |
        pod |
        package_declaration |
        end |
        /[^;\n]*[;\n]/

end : /(__END__)|(__DATA__)/ { $thisparser->EndReached(); }

pod : /=((head[1-9])|(over)|(item[1-9])|(back)|(pod))(.|\s)*?(=cut|$)/
+ { $thisparser->AddPOD($thisline, @item); }

package_declaration : 'package' /[a-zA-Z_:][a-zA-Z0-9_:]*/ ';' { $this
+parser->StartNewPackage($thisline, @item); }

sub_definition :    'sub' /[a-zA-Z_:][a-zA-Z0-9_:]*(\s*\([$@%\\*]*\))?
+/ <perl_codeblock> { $thisparser->AddSub($thisline, @item); }
[download]

I would also appreciate any comments on the grammar, if I miss some of these perl special cases, which may lead to a wrong parsing result.

Thanks in advance,

Andre

Comment on Speeding up RecDesent parser for Perl Code Download Code

Replies are listed 'Best First'.
Re: Speeding up RecDesent parser for Perl Code by merlyn (Sage) on Sep 04, 2006 at 16:57 UTC
PRD works by "nibbling", and this is OK for little strings, but horrible for big ones. There's no fix. Damian has had an eternal "todo" to create Parse::FastDescent, but it is unlikely that he will ever get enough tuits for that. So you must either figure out how to parse less at a time, or switch to something other than PRD. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re^2: Speeding up RecDesent parser for Perl Code by adamk (Chaplain) on Sep 04, 2006 at 22:01 UTC
I believe Parse::FastDecent has been renamed to Perl 6 Rules :)	[reply]
Re: Speeding up RecDesent parser for Perl Code by adamk (Chaplain) on Sep 04, 2006 at 22:00 UTC
You know what you are trying to do has been done already, right? Normally I'd be suggesting PPI here normally. Except that PPI isn't that fast either, so I'm unsure if it would be a significant advantage in that respect (if speed is of primary importance). HOWEVER, using PPI would certainly be more accurate.	[reply]
Re^2: Speeding up RecDesent parser for Perl Code by Outaspace (Scribe) on Sep 05, 2006 at 10:57 UTC
I have looked at the PPI modul allready, but I think I have to parse the result of PPI to get the thinks that I want. So I need double parsing and it would be slower than one RD run. Also PPI seems a bit to accurate for me. Andre	[reply]
Re^3: Speeding up RecDesent parser for Perl Code by adamk (Chaplain) on Sep 05, 2006 at 11:06 UTC
PPI spits out an object tree, so it's not a case of parse, rather `my @packages = PPI::Document->new('filename.pm') ->find('PPI::Statement::Package');` [download]	[reply] [d/l]
Re: Speeding up RecDesent parser for Perl Code by aufflick (Deacon) on Sep 07, 2006 at 01:35 UTC
3000 lines? That's a long Perl file!	[reply]
Re^2: Speeding up RecDesent parser for Perl Code by Outaspace (Scribe) on Sep 07, 2006 at 15:31 UTC
My main Modul has 7555 line (with commentblocks above every sub, but they have to be parsed too). Its yust keeps growing with every new feature.	[reply]
Re^3: Speeding up RecDesent parser for Perl Code by merlyn (Sage) on Sep 07, 2006 at 16:08 UTC
Perhaps you need to refactor that a bit. Ouch. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re^4: Speeding up RecDesent parser for Perl Code by Outaspace (Scribe) on Sep 07, 2006 at 16:55 UTC
Re^5: Speeding up RecDesent parser for Perl Code by merlyn (Sage) on Sep 07, 2006 at 21:15 UTC
Some notes below your chosen depth have not been shown here
Re^5: Speeding up RecDesent parser for Perl Code by aufflick (Deacon) on Sep 11, 2006 at 01:41 UTC