in reply to Re^6: XML:: DOM and Accented Characters
in thread XML:: DOM and Accented Characters

Picwick, here's the code prior to trying any of the suggestions made.

We don't need the code prior the suggestions because we already know why this code can't work as expacted. We need the code where you override automatic encoding of the perl I/O layer with >:utf8, because this code really should work.

Give us your latest code, there shure is an error somewhere.

  • Comment on Re^7: XML:: DOM and Accented Characters

Replies are listed 'Best First'.
Re^8: XML:: DOM and Accented Characters
by freeflyer (Novice) on Aug 09, 2010 at 08:59 UTC

    Hi, I've got 5 versions of code incorporating various suggestions made to me, none of which I can (yet) get to work on windows. The last version I have tested on a Unix machine and it worked OK. Trying to open this Unix created XML on windows results in it opening OK

    #!/bin/perl -w use XML::DOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("c:\\accentTest.xml", ProtocolEncoding +=> 'UTF-8'); # Print doc file $doc->printToFile ("c:\\accentTestOutPut.xml"); #re-open file in UTF-8 encoded filehandle open my $fh, ">:utf8", "accentTestOutPut.xml" or die $!; $doc->print($fh); # cleanup $doc->dispose;
    #!/bin/perl -w use XML::DOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("c:\\accentTest.xml", ProtocolEncoding +=> 'UTF-8'); # Print doc file $doc->printToFile ("c:\\accentTestOutPut.xml"); #re-open file in UTF-8 encoded filehandle open my $fh, ">:utf8", "accentTestOutPut.xml" or die $!; print $fh "\x{FEFF}"; # BOM $doc->print($fh); # cleanup $doc->dispose;
    #!/bin/perl -w use XML::DOM; use UTF8BOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("c:\\accentTest.xml", ProtocolEncoding +=> 'UTF-8'); # Print doc file $doc->printToFile ("c:\\accentTestOutPut.xml"); UTF8BOM->insert_into_file('c:\\accentTestOutPut.xml'); # cleanup $doc->dispose;
    #!/bin/perl -w use XML::DOM; use Encode qw(encode_utf8); my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("c:\\accentTest.xml", ProtocolEncoding = +> 'UTF-8'); # Print doc file $doc->printToFile ("c:\\accentTestOutPut.xml"); open my $fh, ">:utf8", "accentTestOutPut.xml" or die $!; encode_utf8($fh); $doc->print($fh); # cleanup $doc->dispose;
    #!/bin/perl -w use XML::DOM; use PerlIO::encoding; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("c:\\accentTest.xml", ProtocolEncoding = +> 'UTF-8'); # Print doc file $doc->printToFile ("c:\\accentTestOutPut.xml"); open my $fh, ">:encoding(UTF-8)", "accentTestOutPut.xml" or die $!; $doc->print($fh); # cleanup $doc->dispose;

    What I have also discovered is that changing the 1st line of the XML to <?xml version="1.0" encoding="windows-1252"?> (as suggested by ikegami) in all cases results in me being able to open the file OK in windows.

      The idea was to not call ->printToFile (which you're doing in all five cases), but to use the suggested code instead:

      #!/usr/bin/perl -w use XML::DOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("c:\\accentTest.xml"); open my $fh, ">:utf8", "c:\\accentTestOutPut.xml" or die $!; print $fh "\x{FEFF}"; # BOM $doc->print($fh); $doc->dispose;

        Looking back at your post I can't believe I misunderstood you. Been banging my head against the wall all weekend and the answer was there all along :)

        Thanks almut (and everyone else who has helped). It's been a learning experience in perl and reading posts properly ;)