Outaspace has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow Monks of the Perl wisdom,

I have written a grammar for parsing a Perl Code file for packages, subs and POD sections. But it is still slow, if I try to parse large (>3000 lines) files. Does anyone has a idea how I can speed it up a little.

# Tokens word : /[a-zA-Z_:][a-zA-Z0-9_:]*/ quotelike : <perl_quotelike> | ( 'qw' '(' word(s?) ')' ) # Start of Perl-File startrule : { $thisparser->StartParsing(); } block_construct(s) { $thisparser->EndParsing($text); } block_construct : sub_definition | pod | package_declaration | end | /[^;\n]*[;\n]/ end : /(__END__)|(__DATA__)/ { $thisparser->EndReached(); } pod : /=((head[1-9])|(over)|(item[1-9])|(back)|(pod))(.|\s)*?(=cut|$)/ + { $thisparser->AddPOD($thisline, @item); } package_declaration : 'package' /[a-zA-Z_:][a-zA-Z0-9_:]*/ ';' { $this +parser->StartNewPackage($thisline, @item); } sub_definition : 'sub' /[a-zA-Z_:][a-zA-Z0-9_:]*(\s*\([$@%\\*]*\))? +/ <perl_codeblock> { $thisparser->AddSub($thisline, @item); }

I would also appreciate any comments on the grammar, if I miss some of these perl special cases, which may lead to a wrong parsing result.

Thanks in advance,

Andre

Replies are listed 'Best First'.
Re: Speeding up RecDesent parser for Perl Code
by merlyn (Sage) on Sep 04, 2006 at 16:57 UTC
    PRD works by "nibbling", and this is OK for little strings, but horrible for big ones. There's no fix. Damian has had an eternal "todo" to create Parse::FastDescent, but it is unlikely that he will ever get enough tuits for that.

    So you must either figure out how to parse less at a time, or switch to something other than PRD.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      I believe Parse::FastDecent has been renamed to Perl 6 Rules :)
Re: Speeding up RecDesent parser for Perl Code
by adamk (Chaplain) on Sep 04, 2006 at 22:00 UTC
    You know what you are trying to do has been done already, right?

    Normally I'd be suggesting PPI here normally.

    Except that PPI isn't that fast either, so I'm unsure if it would be a significant advantage in that respect (if speed is of primary importance).

    HOWEVER, using PPI would certainly be more accurate.
      I have looked at the PPI modul allready, but I think I have to parse the result of PPI to get the thinks that I want. So I need double parsing and it would be slower than one RD run. Also PPI seems a bit to accurate for me.

      Andre
        PPI spits out an object tree, so it's not a case of parse, rather
        my @packages = PPI::Document->new('filename.pm') ->find('PPI::Statement::Package');
Re: Speeding up RecDesent parser for Perl Code
by aufflick (Deacon) on Sep 07, 2006 at 01:35 UTC
    3000 lines? That's a long Perl file!
      My main Modul has 7555 line (with commentblocks above every sub, but they have to be parsed too). Its yust keeps growing with every new feature.