Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: XMLin question (xmlfixup.pl)

by Anonymous Monk
on Feb 15, 2013 at 19:44 UTC ( [id://1018953]=note: print w/replies, xml ) Need Help??


in reply to XMLin question

#!/usr/bin/perl -- use strict; use warnings; use HTML::Encoding 'encoding_from_http_message'; use WWW::Mechanize; use Encode; use HTML::Tree; my $file = shift or die " Usage: xmlfixup.pl file:in.xml > out.xml xmlfixup.pl http://example.com/foo.xml > out.utf8.xml "; my $resp = WWW::Mechanize->new( autocheck => 1 )->get( $file ); my $enco = encoding_from_http_message( $resp ); my $utf8; if( $enco ) { $utf8 = decode( $enco => $resp->content ); } else { $utf8 = $resp->content; } my $t = HTML::TreeBuilder->new( qw( ignore_unknown 0 no_space_compacting 1 ignore_ignorable_whitespace 0 implicit_tags 0 no_expand_entities 1 store_comments 1 store_pis 1 ) ); #~ $t->xml_mode( 1 ); $t->parse_content( $utf8 ); binmode STDOUT, ':utf8'; print $_->as_XML for $t->content_list; __END__

Replies are listed 'Best First'.
Re^2: XMLin question (xmlfixup.pl)
by tmharish (Friar) on Feb 21, 2013 at 12:43 UTC
    Fails when data contains <![CDATA[ ... ]]>
Re^2: XMLin question (xmlfixup.pl)
by tmharish (Friar) on Feb 21, 2013 at 12:45 UTC

    I would like to use this. with a fix I have written for CDATA and a couple of other things, on XML::Smart.

    Please /msg me or reply to this so I can assign credit.

      by Anonymous Monk http://perlmonks.org/?node_id=1018953

        Sadly this breaks for too many cases - am re-writing XML::Smart::HTMLParser ( located also on GitHub )

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1018953]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-04-19 09:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found