Actually that's more of a myth than a truism. I did try it, but the speedup wasn't significant, though it varies from perl to perl.
However I think there may be some value in the buffer check being moved to nextchar. I was originally thinking it needed to be in next() because sometimes next() is called on its own (for the encoding detection routines which need a byte-by-byte view), but that read()s in character by character anyway, so it might be a reasonable optimisation.
You're right, I have tried all of the various "give me a character" methods, and substr() comes out on top. Which is a pain in the ass really - it's one point where Perl loses out to python where you can do string[0] to get the first character, just like you can in C.
I'll leave the goto as is for now. It's not as bad as people make out - it's only bad when it's used for all flow control, and I think it's intention is quite clear here. Plus given a 1024 buffer, it's only part of the path 70 times in the parse of this 70K XML file.
I do like that last optimisation though - that coupled with moving the buffer test into nextchar() might make a big difference (though maybe not as I don't think the particular test file in question has any UTF-8 characters in it). I'll try it and come back and let you know.
In reply to Re: Re (tilly) 1: XML::SAX::PurePerl Performance
by Matts
in thread XML::SAX::PurePerl Performance
by Matts
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |