http://qs1969.pair.com?node_id=1018953


in reply to XMLin question

#!/usr/bin/perl -- use strict; use warnings; use HTML::Encoding 'encoding_from_http_message'; use WWW::Mechanize; use Encode; use HTML::Tree; my $file = shift or die " Usage: xmlfixup.pl file:in.xml > out.xml xmlfixup.pl http://example.com/foo.xml > out.utf8.xml "; my $resp = WWW::Mechanize->new( autocheck => 1 )->get( $file ); my $enco = encoding_from_http_message( $resp ); my $utf8; if( $enco ) { $utf8 = decode( $enco => $resp->content ); } else { $utf8 = $resp->content; } my $t = HTML::TreeBuilder->new( qw( ignore_unknown 0 no_space_compacting 1 ignore_ignorable_whitespace 0 implicit_tags 0 no_expand_entities 1 store_comments 1 store_pis 1 ) ); #~ $t->xml_mode( 1 ); $t->parse_content( $utf8 ); binmode STDOUT, ':utf8'; print $_->as_XML for $t->content_list; __END__

Replies are listed 'Best First'.
Re^2: XMLin question (xmlfixup.pl)
by tmharish (Friar) on Feb 21, 2013 at 12:43 UTC
    Fails when data contains <![CDATA[ ... ]]>
Re^2: XMLin question (xmlfixup.pl)
by tmharish (Friar) on Feb 21, 2013 at 12:45 UTC

    I would like to use this. with a fix I have written for CDATA and a couple of other things, on XML::Smart.

    Please /msg me or reply to this so I can assign credit.

      by Anonymous Monk http://perlmonks.org/?node_id=1018953

        Sadly this breaks for too many cases - am re-writing XML::Smart::HTMLParser ( located also on GitHub )