Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

We have a LOT of Russian members and one of them translates our newsletter into Russian for us. We go to our utility to send it and for some reason it goes into the database like this:
Очень Важн&#1072 +;я Бизнес-Шк&#1 +086;ла
Is there a way for Perl to convert that back into the language before it goes into the email? The body gets the same thing, but it is displayed just fine, for some reason though the subject does not get displayed it is shown as above...

thanks,
Rich

Replies are listed 'Best First'.
Re: language problem with perl
by almut (Canon) on Jun 01, 2009 at 10:37 UTC
    use HTML::Entities; my $encoded = "Очень Ва&#107 +8;ная Бизнес-&# +1064;кола"; my $decoded = decode_entities($encoded);

    This gives you a Perl text (unicode) string that you can then encode into whatever output encoding you need, e.g. UTF-8.

    Update: for background info on encoding non-ASCII text in email headers (like the Subject: line), see RFC 2047 — though most mail user agents should handle that for you, when configured properly.

Re: langauge problem with perl
by shmem (Chancellor) on Jun 01, 2009 at 10:40 UTC

    use HTML::Entities:

    use HTML::Entities; my $str = <<EOH; &#1054;&#1095;&#1077;&#1085;&#1100; &#1042;&#1072;&#1078;&#1085;&#1072 +;&#1103; &#1041;&#1080;&#1079;&#1085;&#1077;&#1089;-&#1064;&#1082;&#1 +086;&#1083;&#1072; EOH print decode_entities($str);

    should output russian unicode.

Re: langauge problem with perl
by Perlbotics (Archbishop) on Jun 01, 2009 at 10:52 UTC

    Hi, since the e-mail body is displayed fine, the problem is most likely that the mail-server supports 7bit ASCII only (Subject: line).

    Unfortunately, I am not aware of a module that translates Очень Важная Биз нес-Школа (hope that is nothing rude ;-) into something that can be identified in ASCII... Maybe Lingua::RU::Charset?

    The easiest solution would be to provide the original subject - or one that is translated into English - as the subject line for e-mails.

    Update: Ok, Babelfish said Very important Business- school
    Update2: As almut said plus this module: Encode::MIME::Header

      Babelfish was correct.

      Some Russian speakers write email subjects in translit (Russian words written with Latin letters), to escape these problems altogether. Even if you solve the problem on your end, you have no guarantee that the recipient will be able to read your subject line. For example, my school had its Exchange server misconfigured in such a way that Cyrillic letters in subjects turned into question marks when forwarded to external addresses.

      Searching CPAN for 'Translit' yields several modules that look promising.