in reply to Re^3: Pulling data out of { }
in thread Pulling data out of { }

The sub is only eval'ed once per new.

I bet it (->parse) doesn't run slowly. The time is probably spent parsing the grammar, compiling it into Perl code, and compiling the Perl code (->new). Parse::RecDescent shouldn't be used directly in production code. Modules created by Parse::RecDescent's Precompile method should be used instead.

There is no benefit to using require here. I don't like to use use at spots other than at the top of the package, and I wanted to group the testing code together.

Replies are listed 'Best First'.
Re^5: Pulling data out of { }
by BrowserUk (Patriarch) on Jan 17, 2006 at 05:47 UTC
    The time is probably spent parsing the grammar,

    Hmm. Seems not. I just put separate timings around the ->new and the ->parse and the former took just 1 second.

    The other day I wanted to compare the time taken and resilience of our solutions. It doesn't matter a jot for the OPs application as there are a maximum of 26 drives, but I was interested in the comparative performance of my solution.

    Rather than doing a full benchmark, I thought I'd just create a file with enough data that the timings came out in seconds. I tried mine against 676 records ('aa'..'zz'), but it was well under a second so I generated 17,576 records (<c>'aaa'..'zzz'/<c>) and it took around 5 seconds.

    I then started your Parse::RecDescent solution against the same file and went off and did something else. When I returned, it had already consumed well over an hour of cpu and eventually finished having consumed 98 minutes of cpu on a quiescent system! So in 1 1/2 hours or so time I should get how long the parsing takes.

    I know P::RD isn't the quickest thing on the block, but this difference is way greater than the last time I measured. There has to be something about this particular grammar that is causing it to go so slowly, but I'm not sufficiently familiar with P::RD to recognise what?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      There's next to no decision making (failed key backtracks to '}', failed record backtracks to /\Z/), so it shouldn't be intrinsicly slow. Rules like a : b c | b d could easily drag down the parser, but there's no such rule.

      The parser might be copying the entire unparsed input text everytime it calls a rule, instead of just keeping a pointer to the current location within the text. This could lead to high memory usage and swapping.

      The parser provides a number of variables to each rule. Some provide info to error-reporting functions, even if they are not used. Some are to allow actions to be as flexible as possible. There could be lots of overhead. Reducing the number of rules by inling those without '|'s should speed things up, but would have an severe impact on readability and maintainability.

      You could try turning off warnings. I have no idea how much overhead they add.

      Instead of returning a data structure, the actions could add to a data structure as soon as a rule is parsed. This could save a lot of memory and copying.

      You could Precompile the grammar and look at it. Or even use it as if it was Precompiled:

      Parse::RecDescent->Precompile($grammar, 'g') or die("Bad grammar\n"); require g; my $p = g->new(); my $text = ...; my $data = $p->parse($text) or die("Bad data\n");

        The memory consumption is steady at around 80 MB, so there is no swapping involved. I just don't see what would be taking all the time.

        The only vaguely unusual thing (in that I've never done it in a P::RD grammar), is your calling a function (dequote) as a part of the grammar--hence my question.

        I've seen and expected ratios of 30:1 for P::RD to regex performance, acknowledging that you get some of that back through functionality, but a ratio of 1200:1 just seems way over the top and--I thought--probably indicate some kind of error in the grammar.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.