ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Got an issue here which has had me perplexed for a good 2 hours :/

if ($test_title !~ /^$country_name/) { #$title = qq|FOO: $title|; $title = qq|$country_name $title|; } else { ... do some other stuff }


We are running out server in UTF8 btw.

For some reason, the above code is breaking the string - and it comes out like so:

REUNION Circuit Réunion en Liberté

..if I comment out this line:

    $title = qq|$country_name $title|;

..and uncomment this one:

        $title = qq|FOO: $title|;

..the import process adds it correctly:

FOO Circuit Réunion en Liberté

I'm really confused as to why this is happening. $country_name is just an uppercase version of the country (normal A-Z, no accented charachters in it at all)

Can anyone give me any pointers? Its driving me nuts :(

TIA!

Andy

Replies are listed 'Best First'.
Re: Bizarre UTF8 issue with strings
by moritz (Cardinal) on Sep 17, 2010 at 16:20 UTC
    Is $country_name a decoded string? What about $title? is use utf8; in scope? Is there any output layer involved?

    (with "decoded string" I mean one that has gone through Encode::decode() or related functions, or through an :encoding(UTF-8) or :utf8 IO layer).

    See also: Encodings and Unicode in Perl.

    Perl 6 - links to (nearly) everything that is Perl 6.
      Hi,

      Thanks for the replies guys.

      The file being read is already in "UTF8 without BOM"

      The $country_name is a basic string (a-z0-9 etc)

      I tried doing:

      use Encode qw( encode ); $country_name = encode('utf8', $country_name);


      ..and that seems to have done the trick.

      Man wish I had realised this hours ago.

      Thanks a ton for the replies - you've saved my remaining hair ;)

      Cheers

      Andy
Re: Bizarre UTF8 issue with strings
by ikegami (Patriarch) on Sep 17, 2010 at 16:20 UTC

    Sounds like you're not encoding your text into bytes on output, so Perl has to guess what you want. It warns you about it with "Wide character in print" in one of the cases.

    binmode STDOUT, ':encoding(...)'; print $title;

    or

    use Encode qw( encode ); print encode('...', $title);

    Update: The lack of character above 255 in the string posted by the OP means my explanation is not completely accurate. The solution is still pertinent, though.