in reply to Re (tilly) 1: XML::SAX::PurePerl Performance
in thread XML::SAX::PurePerl Performance

If you are aiming for performance, my first question would be whether you need next and nextchar to be overridable. Write them as function calls and you will get a significant speedup.

Actually that's more of a myth than a truism. I did try it, but the speedup wasn't significant, though it varies from perl to perl.

However I think there may be some value in the buffer check being moved to nextchar. I was originally thinking it needed to be in next() because sometimes next() is called on its own (for the encoding detection routines which need a byte-by-byte view), but that read()s in character by character anyway, so it might be a reasonable optimisation.

You're right, I have tried all of the various "give me a character" methods, and substr() comes out on top. Which is a pain in the ass really - it's one point where Perl loses out to python where you can do string[0] to get the first character, just like you can in C.

I'll leave the goto as is for now. It's not as bad as people make out - it's only bad when it's used for all flow control, and I think it's intention is quite clear here. Plus given a 1024 buffer, it's only part of the path 70 times in the parse of this 70K XML file.

I do like that last optimisation though - that coupled with moving the buffer test into nextchar() might make a big difference (though maybe not as I don't think the particular test file in question has any UTF-8 characters in it). I'll try it and come back and let you know.

  • Comment on Re: Re (tilly) 1: XML::SAX::PurePerl Performance

Replies are listed 'Best First'.
Re (tilly) 3: XML::SAX::PurePerl Performance
by tilly (Archbishop) on Feb 05, 2002 at 16:51 UTC
    YMMV, but when I tested it on my machine the popular myth was correct, a method call was massively slower than a function call. (Of course if you do anything interesting in your function...)

    Speaking of functions, I am kind of wondering what the purpose of next is. It seems from the code I see in it that it was intended to keep track of things like line numbers. But I don't see the rest of the code that would be needed to do that. (Sign of a change in design?) If that is the case, then what next is really providing is buffering.

    But isn't buffering exactly what read is supposed to do for you? OK, its speed is highly platform (and compilation option) dependent. But it seems to me that either you are better off using read, or else next should remove the additional buffering by using sysread.

      Points answered in order:

      The methods vs functions thing looks bad in a benchmark, but generally only when you have something like 100_000 calls or something like that. And usually only with empty function bodies, like you suggest. So the call itself may appear to be twice as slow, but it doesn't really show up that bad in real life applications. Plus in 5.7.2, Doug MacEachern of mod_perl fame has made it so that sometimes method calls can be faster than function calls (don't ask me how - his voodoo is way beyond mine).

      The purpose of next, I admit, has been lost in a series of refactorings. I think it's either time to stop refactoring for speed and try to clean things up, or stop refactoring for speed and fix the remaining compliance issues instead ;-)

      I thought read was supposed to buffer too. But I was surprised to see a speedup when I did some buffering of my own. Maybe it only buffers if you ask for a significant number of characters? I have no idea what the internals are of it all, I only know to not believe everything you read, even from gurus ;-)

        I don't know that much about the full internals myself, but read is supposed to maintain an internal I/O buffer. However depending on how you configure and compile Perl, and how Perl interacts with the buffering layer(s), the code path that it takes can be very slow in practice.

        In particular I have seen people get [id://29807impressive speedups] before by bypassing stdio and just buffering themselves in Perl. That shouldn't be plausible, but it happened. The same code was much slower on Windows. (I don't think it would happen if you compiled Perl to use Perl's I/O layer, but that was not historically the default, and then Perl and C libraries might not cooperate properly...)