It appears that Mojo::DOM can handle that particular example HTML, this prints "<p>this is a test</p>" and "<p>this is a second test</p>":
use warnings; use strict; use Mojo::DOM; my $dom = Mojo::DOM->new( do { local $/; <DATA> } ); for my $e ($dom->find('html html > body p')->each) { print $e->to_string, "\n"; }
Update: Switched the above from finding the <h1> tag to the <p> tags, to show that it does not get confused like in your example. Update 2: Added newline for clarity.
In reply to Re: Parsing incorrect html
by haukex
in thread Parsing incorrect html
by seki
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |