in reply to Re^7: XML:: DOM and Accented Characters
in thread XML:: DOM and Accented Characters

Hi, I've got 5 versions of code incorporating various suggestions made to me, none of which I can (yet) get to work on windows. The last version I have tested on a Unix machine and it worked OK. Trying to open this Unix created XML on windows results in it opening OK

#!/bin/perl -w use XML::DOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("c:\\accentTest.xml", ProtocolEncoding +=> 'UTF-8'); # Print doc file $doc->printToFile ("c:\\accentTestOutPut.xml"); #re-open file in UTF-8 encoded filehandle open my $fh, ">:utf8", "accentTestOutPut.xml" or die $!; $doc->print($fh); # cleanup $doc->dispose;
#!/bin/perl -w use XML::DOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("c:\\accentTest.xml", ProtocolEncoding +=> 'UTF-8'); # Print doc file $doc->printToFile ("c:\\accentTestOutPut.xml"); #re-open file in UTF-8 encoded filehandle open my $fh, ">:utf8", "accentTestOutPut.xml" or die $!; print $fh "\x{FEFF}"; # BOM $doc->print($fh); # cleanup $doc->dispose;
#!/bin/perl -w use XML::DOM; use UTF8BOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("c:\\accentTest.xml", ProtocolEncoding +=> 'UTF-8'); # Print doc file $doc->printToFile ("c:\\accentTestOutPut.xml"); UTF8BOM->insert_into_file('c:\\accentTestOutPut.xml'); # cleanup $doc->dispose;
#!/bin/perl -w use XML::DOM; use Encode qw(encode_utf8); my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("c:\\accentTest.xml", ProtocolEncoding = +> 'UTF-8'); # Print doc file $doc->printToFile ("c:\\accentTestOutPut.xml"); open my $fh, ">:utf8", "accentTestOutPut.xml" or die $!; encode_utf8($fh); $doc->print($fh); # cleanup $doc->dispose;
#!/bin/perl -w use XML::DOM; use PerlIO::encoding; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("c:\\accentTest.xml", ProtocolEncoding = +> 'UTF-8'); # Print doc file $doc->printToFile ("c:\\accentTestOutPut.xml"); open my $fh, ">:encoding(UTF-8)", "accentTestOutPut.xml" or die $!; $doc->print($fh); # cleanup $doc->dispose;

What I have also discovered is that changing the 1st line of the XML to <?xml version="1.0" encoding="windows-1252"?> (as suggested by ikegami) in all cases results in me being able to open the file OK in windows.

Replies are listed 'Best First'.
Re^9: XML:: DOM and Accented Characters
by almut (Canon) on Aug 09, 2010 at 11:44 UTC

    The idea was to not call ->printToFile (which you're doing in all five cases), but to use the suggested code instead:

    #!/usr/bin/perl -w use XML::DOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("c:\\accentTest.xml"); open my $fh, ">:utf8", "c:\\accentTestOutPut.xml" or die $!; print $fh "\x{FEFF}"; # BOM $doc->print($fh); $doc->dispose;

      Looking back at your post I can't believe I misunderstood you. Been banging my head against the wall all weekend and the answer was there all along :)

      Thanks almut (and everyone else who has helped). It's been a learning experience in perl and reading posts properly ;)