Re: Parsing incorrect html

It appears that Mojo::DOM can handle that particular example HTML, this prints "this is a test" and "this is a second test":

use warnings;
use strict;
use Mojo::DOM;

my $dom = Mojo::DOM->new( do { local $/; <DATA> } );
for my $e ($dom->find('html html > body p')->each) {
    print $e->to_string, "\n";
}
[download]

Update: Switched the above from finding the <h1> tag to the  tags, to show that it does not get confused like in your example. Update 2: Added newline for clarity.

Comment on Re: Parsing incorrect html Select or Download Code