As for the internals of the next method, I haven't run benchmark, but the basic approaches I would consider are the substr approach, splitting the string into an internal array you shift off, and using a /(.)/gs pattern match. (Warning on the last. Use of $' etc anywhere will slow that down. This may be a good reason to avoid no matter what benchmarking says.) I assume you have tried all of them? (Probably but it doesn't hurt to check.)
An incidental note. Your goto can be removed from next with perceivable performance change. Make the if condition be empty, put an else around throwing the exception, and then move the BUFFERED_READ section after the decision logic. This should also be marginally faster because Perl doesn't have to spent time figuring out where the goto goes. (That shouldn't be the common path though, so the change should be marginal.)
I am betting that 5.005 performance is not a priority of yours. But if you are allowing the logic to skip going to next anyways, you can replace a lot of the 5.005 logic with something like the following (untested):
Again, it is more complex, but the fact you avoid a whole series of function calls should be a significant speedup.my $len = $n >= 0xFC ? 5 : $n >= 0xF8 ? 4 : $n >= 0xF0 ? 3 : $n >= 0xE0 ? 2 : $n >= 0xC0 ? 1 : throw XML::SAX::Exception::Parse( Message => sprintf("Invalid character 0x%x", $n), ColumnNumber => $self->column, LineNumber => $self->line, PublicId => $self->public_id, SystemId => $self->system_id, ); if ($len <= length($self->[BUFFER])) { $current .= substr($self->[BUFFER], 0, $len, ''); $self->[CURRENT] = substr($self->[BUFFER], -1); } else { $len -= length($self->[BUFFER]); $current .= $self->[BUFFER]; $self->[BUFFER] = ''; while (-1 < --$len) { next($self); $current .= $self->[CURRENT]; } }
In reply to Re (tilly) 1: XML::SAX::PurePerl Performance
by tilly
in thread XML::SAX::PurePerl Performance
by Matts
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |