in reply to Re^2: HTML::Parser to extract link text?
in thread HTML::Parser to extract link text?

Now, applying this knowledge, my feeling is that using Treebuilder would again hurt perfomance. Right?

Probably. I'm curious, how many HTML pages will you be parsing per second in your finished product?

Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

  • Comment on Re^3: HTML::Parser to extract link text?

Replies are listed 'Best First'.
Re^4: HTML::Parser to extract link text?
by isync (Hermit) on Jun 19, 2007 at 21:32 UTC
    This questions does not exactly apply. ;-) But for the link-extraction part, Parser took around 0.08secs where Extor did 0.12 and Extractor 0.28secs (for average html).

    Any help with my Parser question?

      This questions does not exactly apply. ;-) But for the link-extraction part, Parser took around 0.08secs where Extor did 0.12 and Extractor 0.28secs (for average html).

      Okay, let me rephrase then... Why do you need such high performance for your project?

      Any help with my Parser question?

      No, sorry, I kind of promised myself to no longer waste time with HTML::Parser. It is a great module, but low level, and I can solve any HTML parsing problem much faster with higher level modules like HTML::TreeBuilder.

      Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

        Why do you need such high performance for your project?

        Because time is an issue on this task and yes, the script has to munge quite a bunch of html. All I can leak, sorry.