in reply to Re: Why is Perl suddenly slow in THIS case?
in thread Why is Perl suddenly slow in THIS case?
As the original case is some parser like this:
sub parseAny { #my $p = shift; # pkg or doc my $c = shift; #my $objnum = shift; #my $gennum = shift; return ${$c} =~ m/ \G \d+\s+\d+\s+R\b /xms ? 'parseRef( $c, $ +objnum, $gennum)' : ${$c} =~ m{ \G / }xms ? 'parseLabel( $c, $ +objnum, $gennum)' : ${$c} =~ m/ \G << /xms ? 'parseDict( $c, $ +objnum, $gennum)' : ${$c} =~ m/ \G \[ /xms ? 'parseArray( $c, $ +objnum, $gennum)' : ${$c} =~ m/ \G [(] /xms ? 'parseString( $c, $ +objnum, $gennum)' : ${$c} =~ m/ \G < /xms ? 'parseHexString($c, $ +objnum, $gennum)' : ${$c} =~ m/ \G [\d.+-]+ /xms ? 'parseNum( $c, $ +objnum, $gennum)' : ${$c} =~ m/ \G (true|false) /ixms ? 'parseBoolean( $c, $ +objnum, $gennum)' : ${$c} =~ m/ \G null /ixms ? 'parseNull( $c, $ +objnum, $gennum)' : die "Unrecognized type in parseAny\n"; }
I think the best approach would be a tokenizer that takes the first character and decides from that what to do. This would mean rewriting the regex into something really unreadable like:
sub parseAny_token { #my $p = shift; # pkg or doc my $c = shift; #my $objnum = shift; #my $gennum = shift; my $ch = m{ \G (?: ([0-9]+) |(/) |(<<) |(\[) |\([.+-]) |(true|false) |(null) }xmsi } or die "Unrecognized type in parseAny\n"; # now dispatch based on $1 etc: if( defined $1 ) { my $num = $1; if( m/\G\s+\d+\s+R\b/ ) { # Handle $num $num $ parseRef( $c, $objnum, $gennum) } elsif( m/\G([-+.\d+])/ ) { # handle "$num$1" "$num$1 parseNum( $c, $objnum, $gennum) } else { # handle "$num" }; } elsif( defined $2 ) { # / parseLabel( $c, $objnum, $gennum) } ... }
That would need a lot of good unit tests to make sure the grammar rewrite still works and especially still picks up parsing at the right places when something like a +R comes in the input stream.
In my toy implementation for the tokenizer, I get 90% of the performance of the original R case for both cases. Maybe it would be worth to share your problematic PDF with the author of CAM::PDF (or me) if you can, just to see whether it can be turned into a good test case...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Why is Perl suddenly slow in THIS case?
by vr (Curate) on Mar 06, 2017 at 12:42 UTC |