comment on

If you are aiming for performance, my first question would be whether you need next and nextchar to be overridable. Write them as function calls and you will get a significant speedup.

Actually that's more of a myth than a truism. I did try it, but the speedup wasn't significant, though it varies from perl to perl.

However I think there may be some value in the buffer check being moved to nextchar. I was originally thinking it needed to be in next() because sometimes next() is called on its own (for the encoding detection routines which need a byte-by-byte view), but that read()s in character by character anyway, so it might be a reasonable optimisation.

You're right, I have tried all of the various "give me a character" methods, and substr() comes out on top. Which is a pain in the ass really - it's one point where Perl loses out to python where you can do string[0] to get the first character, just like you can in C.

I'll leave the goto as is for now. It's not as bad as people make out - it's only bad when it's used for all flow control, and I think it's intention is quite clear here. Plus given a 1024 buffer, it's only part of the path 70 times in the parse of this 70K XML file.

I do like that last optimisation though - that coupled with moving the buffer test into nextchar() might make a big difference (though maybe not as I don't think the particular test file in question has any UTF-8 characters in it). I'll try it and come back and let you know.

In reply to Re: Re (tilly) 1: XML::SAX::PurePerl Performance by Matts
in thread XML::SAX::PurePerl Performance by Matts

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.