Processing HTML is not too difficult, as long as tidy is around:
perl -MXML::Twig -e'open( my $fh, "tidy -asxml -quiet pm.html 2>/dev/null| ") or die $!; XML::Twig->parse( $fh)'Keeping the output similar to the input is of course much harder, as in this case XML::Twig does not see the original file.
Here I pay for the fact that XML::Twig does not accept a SAX stream as input, or I could use XML::LibXML::SAX and get HTML parsing for free (SAX was quite new when I started writing XML::Twig, and now it is coupled very strongly with XML::Parser).
In reply to Re^4: The mostly used xml parser
by mirod
in thread The mostly used xml parser
by pajout
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |