in reply to Re: Extracting stylesheet links or url from HTML Page
in thread Extracting stylesheet links or url from HTML Page

This node was taken out by the NodeReaper on Jun 27, 2010 at 05:52 UTC
  • Comment on Reaped: Re^2: Extracting stylesheet links or url from HTML Page

Replies are listed 'Best First'.
Re^3: Extracting stylesheet links or url from HTML Page
by Your Mother (Archbishop) on Jun 24, 2010 at 16:50 UTC

    XML::LibXML is going to be the fastest; if you'll be in IO a bit, you'll have to benchmark to see how much faster. XML::Twig is the most Perly and probably the easiest and most flexible to hack for most Perl hackers. HTML::TokeParser::Simple or HTML::TokeParser will be the most reliable since certain HTML files will simply be too invalid for the others to handle.

    A serious application that really needed the speed...? I'd try:

    • XML::LibXML
      • Worked? Next.
      • Failed to parse? Use HTML::TokeParser. Next.

    Read the Pod for the modules. They will all take different kinds of arguments: files, file handles, strings.

    (update: fixed typo.)

      Thanks for u'r answer. I did benchmark and it was LibXML that was the faster one.