PerlPksky has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to preserve Croatian characters for output in my perl scripts. The characters survive when I copy and paste them into the script I am editing, but they don't survive output from the script. And they don't survive the preview window when I copy and paste them here. We will see how they do when I post this.

#!/usr/bin/perl -w use warnings; print "Župljanin and Stanišić\n";
This simple script does not preserve the special characters used in Croatian characters. When I run the code, I get, Ĺ˝upljanin and StaniĹĄiÄ

How do I fix this?

Replies are listed 'Best First'.
Re: Cannot preserve Latin 2 character sets in Perl
by kennethk (Abbot) on Sep 29, 2011 at 17:11 UTC
Re: Cannot preserve Latin 2 character sets in Perl
by Anonymous Monk on Sep 29, 2011 at 16:52 UTC
    Perlmonks is buggy, and the local gods refuse to fix it, can't help with that. As for your actual question, you need to learn about the topic of encoding. Start by reading http://p3rl.org/UNI. The following program does what you expect.

    use utf8;
    use Encode qw(encode);
    print encode 'UTF-8', 'Župljanin and Stanišić';

      That didn't work exactly, but I poked around a little and tried,

      use utf8;

      use Encode qw(encode);

      print encode 'iso-8859-2', 'Župljanin and Stanišić'."\n";

      And that worked. I should add that "use utf8;" has got to be there, it doesn't work without it.

      Thanks for the tip.

Re: Cannot preserve Latin 2 character sets in Perl
by moritz (Cardinal) on Sep 29, 2011 at 18:26 UTC

    You need to supply more context. By default, Perl treats literal strings as bytes, so if your script is stored in the encoding that your console accepts, it should work.

    If it doesn't work, there is probably a mismatch between these two encodings, but that's hard to diagnose without even knowing which operating system you use.

    If this is some Unix dialect, what's your locale? What terminal or terminal emulator are you using? Which editor do your use, and what character encoding does it store files in?

      OK.

      I am editing on Windows XP Pro using Notepad++ and Dreamweaver. The file is being edited from a remote installation on a LAN that is an SME Server, a Linux distribution. I run the remote script using PuTTY. I usually have to change the translation settings on PuTTY to Latin 2 to get uncorrupted characters on output.

      What I am trying to do is take output from MySQL and print up an HTML page that displays these Croatian characters. MySQL requires some tinkering to get it to handle these characters correctly and only a recent version past 5.0 will do it. But at this point the difficulty is taking the characters that have survived output from MySQL and print them to a page generated by Perl. So the code is something like:

      $HTML = encode( "iso-8859-2", $HTML);

      while (<OLD>) {
      s!<body>.*</body>!<body>$HTML</body>!gs;
      print NEW $_ or die "can't write $new: $!";
      }

      But this isn't working. The output has changed, but the characters are not being preserved. I am getting erroneous characters instead of little squares on browser output.

      What am I missing?