ait has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I am working on an application that may receive HTML and XML data in several encodings, notably ISO-88591-1 and UTF-8. In my test scripts I would like to encode the test data in different encodings, and by setting the appropriate HTTP headers, I want to make sure that my application won't choke on them.

I understand that Perl uses utf8 (lax UTF-8) and all my source code is encoded in UTF-8. I read the Encode perldoc and tried this:
#!/usr/bin/perl use strict; use warnings; use Encode; my $perl_str = "áéíóúñü"; print "IN UTF:$perl_str\n"; my $iso_str = encode("iso-8859-1", $perl_str); print "IN ISO:$iso_str\n"; my $utf_str = decode("iso-8859-1", $iso_str); print "BACK IN UTF:$utf_str\n";


To my surprise the complete output was readable on my shell which should have printed funny characters for the LATIN1 prints (because my shell is configured for UTF-8). I am guessing that Perl translated back to utf8 automagically but would like to corroborate with more knowledgeable monks. Also, if anyone knows and could kindly point to a module that performs these types of tests, would really appreciate it.

Thanks beforehand,
Alejandro Imass

Replies are listed 'Best First'.
Re: How to test different encoding
by jethro (Monsignor) on Sep 16, 2008 at 00:29 UTC

    Excactly, perl changes the LATIN1 back to unicode because the output stream STDOUT is utf8 in your case.

    What you would have to do to see funny characters is to change the output stream with :encoding(...). On my machine

    > perl -e 'binmode STDOUT, ":encoding(iso-8859-1)"; print STDOUT "Hüsk +er Dü\n";' Hüsker Dü > perl -e 'binmode STDOUT, ":encoding(utf8)"; print STDOUT "Hüsker Dü\ +n";' H**sker D**

    I changed the nearly unprintable characters to **. Actually I was surprised to find out my linux is still using latin1 on the shell (or my reasoning is wrong).

    You can find out some more with 'man perlunicode' and 'man encode'

      Thanks!

      I found some good examples of encoding tests in XML::LibXML (the CPAN module that interfaces to Gnome's LibXML2). The file 19encoding.t does a whole bunch of tests similar to what I was trying to accomplish, and gives great ideas on how to use the encoding routines for testing.

      Best,
      Alejandro Imass
        Hi:
        I am interestring on the encoding problem, can you show me where the 19encoding.t file? I can't find this file in my system.
        My XML::LibXML is located in /usr/local/lib/perl/5.8.8/XML/, but I didn't find any file names 19encoding.t
        thx in advance:)