in reply to RE: RE: Re: A grammar for HTML matching
in thread A grammar for HTML matching
Thus HTML::TokeParser cannot possibly be faster than HTML::Parser. (Please correct me if I'm wrong...looking at TokeParser source here...looks like it will do parsing in 512 byte chunks if passed a file rather than a scalar...but the speed hit comes from all the HTML::Parser callbacks for tags you don't care about...so parsing in chunks shouldn't help).
If I gave the impression that a regex on an entire document is faster than HTML::Parser (and friends) I apologize, this is obviously incorrect. I have written an implementation for a finder which looks for something specific by traversing the document using index/rindex, then pos() and m/\G/g. This is faster (when looking for something specific) than an equivalent HTML::Parser or HTML::TokeParser implementation.
But as I've mentioned elsewhere, it's not the implementation I'm interested in at this point, it's the grammar (and the utility of such a grammar). I'd be perfectly happy with a HTML::TokeParser implementation. (and in fact I will write one -- no use speculating about what is faster when you can write it and measure it)
|
|---|