in reply to Re^4: Problem with russian / cyrillic in e-mail program.
in thread Problem with russian / cyrillic in e-mail program.

On first glance, this looks OK, but then, you don't tell me what's wrong with it.

On what character set to use - I don't know. You need to know what character set your strings are in. That's the characterset you then use. Common charsets are UTF-8 , or maybe for cyrillic KOI-8.

  • Comment on Re^5: Problem with russian / cyrillic in e-mail program.

Replies are listed 'Best First'.
Re^6: Problem with russian / cyrillic in e-mail program.
by dbmathis (Scribe) on Apr 04, 2010 at 17:42 UTC

    First of all, thank you for being patient with a dumby. Moving along, I am pulling text from this site http://www.mindmachine.ru/ to test with. I have tried now UTF-8, KOI-8, windows-1251

    I'm assuming that it's perfectly normal for the body of the e-mail to be fine while at the same time the subject be mangled.

    Here's what's wrong with it: Screenshot

      Your string is (HTML) entity-encoded (which you would have also seen by printing it. Either that's this webmail hoster doing this, or it is your program which sends it. I really wonder why you are trying to send email without knowing what your text is in. And why you are scraping the text content from some external web site.

      So, your first step would be to make sure you know what encoding your subject string is in. If it is HTML entities, then HTML::Entities::decode can turn that into an UTF-8 string, which you can then in turn Base64-encode for the MIME subject header. But in reality, it would be much better to eliminate the HTML part of the equation and directly set the subject to some well-known characters in a well-known encoding yourself directly, for example by using:

      my $subject = 'Hello World';

      except using cyrillic charset, potentially in utf8 or KOI-8:

      use utf8; my $subject = 'Hello World'; # except in cyrillic

      or

      use Encoding 'KOI-8'; my $subject = 'Hello World'; # except in cyrillic

      if your source code editor supports KOI-8.

        I was scraping from that site because I needed russian text. Someone was complaining that like 1% of the e-mails being sent out were not right when russian characters were entered into the form. I tried the print already (non cyrillic ) and the program itself is not introducing HTML Entities. I have also been testing in thunderbird and gmail. Both end up with the odd subjects.

        From the sound of it, it seems like I need to figure out what charset people are using every time. I guess it could be any charset considering people around the globe use the form.

        I will just start debugging more and maybe I will find something. Thanks for your help, and if you can think of anything else let me know.

      The screenshoot looks like a webmail client that can't properly handle Unicode (i.e. not your fault). Try using a "real" mail user agent instead, e.g. Thunderbird.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)