in reply to Re^3: The mostly used xml parser
in thread The mostly used xml parser
Processing HTML is not too difficult, as long as tidy is around:
perl -MXML::Twig -e'open( my $fh, "tidy -asxml -quiet pm.html 2>/dev/null| ") or die $!; XML::Twig->parse( $fh)'Keeping the output similar to the input is of course much harder, as in this case XML::Twig does not see the original file.
Here I pay for the fact that XML::Twig does not accept a SAX stream as input, or I could use XML::LibXML::SAX and get HTML parsing for free (SAX was quite new when I started writing XML::Twig, and now it is coupled very strongly with XML::Parser).
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: The mostly used xml parser
by GrandFather (Saint) on Oct 05, 2005 at 21:13 UTC | |
by mirod (Canon) on Oct 06, 2005 at 08:02 UTC |