in reply to Parsing semi-complex HTML

With XML::LibXML, it would be

for my $node ($doc->findnodes('//*[@class="className"]')) { print($node->toString()); }

If you want to use HTML::Parser (e.g. if the HTML isn't valid), don't use it directly. Use HTML::TreeBuilder instead. It creates a tree of HTML::Element objects, whose look_down and as_HTML method you could use.

Replies are listed 'Best First'.
Re^2: Parsing semi-complex HTML
by duelafn (Parson) on Jul 07, 2010 at 15:57 UTC

    Actually, I've never had problems using XML::LibXML on broken HTML:

    use XML::LibXML; my $parser = XML::LibXML->new(); $parser->recover(1); $parser->recover_silently(1); my $doc = $parser->parse_html_string($stuff);

    Good Day,
        Dean

      Thanks, good to know! I never tried.

        If we were in the same office, I'd throw something at you right now. I've been using the recover/recover_silently in my XML::LibXML HTML parsing examples--I think in some threads you were in on--for a couple of years. :(