in reply to Parsing incorrect html
It appears that Mojo::DOM can handle that particular example HTML, this prints "<p>this is a test</p>" and "<p>this is a second test</p>":
use warnings; use strict; use Mojo::DOM; my $dom = Mojo::DOM->new( do { local $/; <DATA> } ); for my $e ($dom->find('html html > body p')->each) { print $e->to_string, "\n"; }
Update: Switched the above from finding the <h1> tag to the <p> tags, to show that it does not get confused like in your example. Update 2: Added newline for clarity.
|
|---|