in reply to advice with Parse::RecDescent

The current P::RD engine does the equivalent of:
$text =~ s/^$MATCH//;
to inch along the string. If $text is small, this is a reasonable way to do it. If $text is large, however, you get a performance hit as you constantly shift the text down in memory.

I'm told that the next version will walk through the string by maintaining pos (using \G and m//g instead), and that this speeds the process up substantially for large strings. Until then, it helps if you can "predigest" the input stream into logical chunks, and hand each one to a separate invocation of a specific grammar step.

In your particular case, if you could limit the invocation to separately invoking P::RD methods for header, then each separate record, then trailer, you'd be much speedier.

You could keep it within one invocation of P::RD by pre-digesting your file into a @queue, then putting just the header into $text, calling the top-level rule, and having the rule for the header suck the next item with $text .= shift @queue, ditto with the rules for record. Of course, backing up would be expensive, but your grammar can commit forward nicely.

-- Randal L. Schwartz, Perl hacker

Replies are listed 'Best First'.
Re: Re: advice with Parse::RecDescent
by demerphq (Chancellor) on Dec 10, 2001 at 20:24 UTC
    Well, as the data is line by line splitting it into chunks is easy. Im just not sure how to approach it in terms of P::RD code. UPDATE
    I failed to read the last para carefully enough, so probably most of this reply is useless... Sorry. Trying your idea now.
    END UPDATE

    Would I just call startrule on each line seperately? I suppose this means I have to change the grammer? Perhaps so that startrule looks like

    startrule : header | record | trailer
    Is this what you had in mind? The other possibility that occurs to me is to use three seperate parsers each for the different types, but somehow I suspect that is a dead end...

    And thank you very much!

    Yves / DeMerphq
    --
    This space for rent.

Re: Re: advice with Parse::RecDescent
by demerphq (Chancellor) on Dec 10, 2001 at 21:36 UTC
    Ok I gave your $text.= @queue idea a try, but it seems unwilling to match multiple records. Im not real sure what Im doing wrong so I have no idea how proceed. Here is a copy of my new grammer with a couple of tother minor changes included. Any ideas of what I've done wrong would be appreciated.
    use strict; use warnings; use Parse::RecDescent; use Data::Dumper; our $RD_TRACE=1; my $Grammar=<<'END_GRAMMAR'; {my $company=""} startrule : file file : header record(s?) trailer_t { $return={header=>$item[1],records=>$item[2],count= +>$item[3]}; 1; } header : header_t data_t { $return={company=>$item[1],code=>$item[2]}; $text.=shift @::text; print "Company Set to $company\n Text=$text"; 1; } record : valid_record | <error> valid_record: type_t ',' number_t ',' number_t(?) ',' "$company" { $return=[ $item[1],$item[3],@{$item[5]} ? $item[5] + : undef ]; $text.=shift @::text; print "Text=$text"; 1; } header_t : /HDR\w+/ {$return=substr($item[1],3); $company=$return +;1;} trailer_t : /TLR\d+/ {$return=substr($item[1],3)} data_t : /\w+/ type_t : /ADD(?:RANGE)?|DELETE(?:RANGE)?/ number_t : /\d+/ END_GRAMMAR my $parser = Parse::RecDescent->new($Grammar) or die "Bad grammar!\n"; our @text=<DATA>; if (defined( my $t=$parser->startrule(shift @text))) { print Dumper($t); } else { print "Bad text!\n"; } __DATA__ HDRCOMPNAME BIG000OLD111IDENTIFIER1020301WITH1010LOTS1010OF1010CRAP ADD,1234567890,,COMPNAME ADD,1234567891,,COMPNAME ADD,1234567892,,COMPNAME ADDRANGE,1468,1680,COMPNAME ADDRANGE,2468,2680,COMPNAME ADDRANGE,3468,3680,COMPNAME DELETE,987654321,,COMPNAME DELETE,987654322,,COMPNAME DELETE,987654323,,COMPNAME DELETERANGE,13579,13599,COMPNAME DELETERANGE,23579,23599,COMPNAME DELETERANGE,33579,33599,COMPNAME TLR000012
    Now weirdly it seems to perform as expected if the same data is organized as so:
    HDRCOMPNAME BIG000OLD111IDENTIFIER1020301WITH1010LOTS1010OF1010CRAP ADD,1234567890,,COMPNAME ADDRANGE,1468,1680,COMPNAME DELETE,987654321,,COMPNAME DELETERANGE,13579,13599,COMPNAME ADD,1234567891,,COMPNAME ADDRANGE,2468,2680,COMPNAME DELETE,987654322,,COMPNAME DELETERANGE,33579,33599,COMPNAME ADD,1234567892,,COMPNAME ADDRANGE,3468,3680,COMPNAME DELETE,987654323,,COMPNAME DELETERANGE,23579,23599,COMPNAME TLR000012
    Which really confuses me! What have I done wrong? Any clues?

    Yves / DeMerphq
    --
    This space for rent.