in reply to Re: Re: What is the fastest way to parse HTML?
in thread What is the fastest way to parse HTML?

Are you trying to actually parse or just strip the text for indexing? How are you doing the indexing? You may want to try some benchmarks out to see where your code is spending the most time. look at: Devel::Profile and Benchmark to help see where the actual slowdowns are happening. In my experiance the strip to text is very fast and the indexing and updating the db is the slow part.

-Waswas
  • Comment on Re: Re: Re: What is the fastest way to parse HTML?

Replies are listed 'Best First'.
Re: Re: Re: Re: What is the fastest way to parse HTML?
by sri (Vicar) on Jul 23, 2003 at 00:17 UTC
    You made me think about another possible improvement, as I said I only use text and some layout tags, so I could use the report_tags() method of HTML::Parser to suppress all unneeded junk.