Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re: XMLin question (

by Anonymous Monk
on Feb 15, 2013 at 19:44 UTC ( #1018953=note: print w/replies, xml ) Need Help??

in reply to XMLin question

#!/usr/bin/perl -- use strict; use warnings; use HTML::Encoding 'encoding_from_http_message'; use WWW::Mechanize; use Encode; use HTML::Tree; my $file = shift or die " Usage: file:in.xml > out.xml > out.utf8.xml "; my $resp = WWW::Mechanize->new( autocheck => 1 )->get( $file ); my $enco = encoding_from_http_message( $resp ); my $utf8; if( $enco ) { $utf8 = decode( $enco => $resp->content ); } else { $utf8 = $resp->content; } my $t = HTML::TreeBuilder->new( qw( ignore_unknown 0 no_space_compacting 1 ignore_ignorable_whitespace 0 implicit_tags 0 no_expand_entities 1 store_comments 1 store_pis 1 ) ); #~ $t->xml_mode( 1 ); $t->parse_content( $utf8 ); binmode STDOUT, ':utf8'; print $_->as_XML for $t->content_list; __END__

Replies are listed 'Best First'.
Re^2: XMLin question (
by tmharish (Friar) on Feb 21, 2013 at 12:43 UTC
    Fails when data contains <![CDATA[ ... ]]>
Re^2: XMLin question (
by tmharish (Friar) on Feb 21, 2013 at 12:45 UTC

    I would like to use this. with a fix I have written for CDATA and a couple of other things, on XML::Smart.

    Please /msg me or reply to this so I can assign credit.

      by Anonymous Monk

        Sadly this breaks for too many cases - am re-writing XML::Smart::HTMLParser ( located also on GitHub )

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1018953]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2022-07-06 01:15 GMT
Find Nodes?
    Voting Booth?

    No recent polls found