combraxis has asked for the wisdom of the Perl Monks concerning the following question:

I have a test html page with form which uses sendmail to send an email with data from the form. The input data includes characters like éêä £ €

The test page and the script it runs are on an Apache Linux web server.

if the html page has a character set of windows-1252 (default for Expression Web) those characters come out correctly in the email.

if the html page has character set of utf-8 those characters come out as garbled text (eg: £ €).

here is the sendmail code:

unless (open(LOG_FILE, ">test_sendmail_log.txt")) { die "Couldn't open output file test_sendmail_log.txt\n"; } print LOG_FILE "$body"; open (MERCHANT_MAIL, "|/usr/sbin/sendmail -oi -t") or die "Can't fork for sendmail: $!\n"; print MERCHANT_MAIL "To: $send_addr\n"; print MERCHANT_MAIL "From: $from_addr\n"; print MERCHANT_MAIL "Subject: $subject\n\n"; print MERCHANT_MAIL $body; close (MERCHANT_MAIL); close(LOG_FILE);

A log file saved at the same time displays all characters correctly, whichever charset the html page has.

The html pages for which I am testing have charset=utf8 - how can I get sendmail to send such characters correctly?

Thank you

Replies are listed 'Best First'.
Re: sendmail problem with utf-8 charset
by Corion (Patriarch) on Nov 22, 2014 at 12:21 UTC

    At the minimum, you will need to declare the encoding for the HTML body part you're sending. See (for example) MIME::Lite for how to specify the encoding of a MIME part of your mail. Also, personally, I wouldn't shell out to sendmail myself but instead I'd use the mail sending facilities of MIME::Lite directly. This also prevents attacks like Shellshock.

    As another note, I hope you have sanitized $end_addr, $from_addr and $subject. If for example they contain newlines, your script can easily be abused to send spam.

      Thank you very much for your advice. With MIME::Lite and its instructions I managed to construct the correct encoding for the messages. They now appear as they should in my and other folk's email clients. Thank you.
Re: sendmail problem with utf-8 charset
by Anonymous Monk on Nov 22, 2014 at 13:31 UTC
    if the html page has character set of utf-8 those characters come out as garbled text (eg: £ €).
    Well, this is how UTF-8 looks like when displayed using Windows-1252. What does 'come out correctly' mean? 'Come out' where?
    A log file saved at the same time displays all characters correctly, whichever charset the html page has.
    The log file is displayed by what?
      Anyway, I recall you need to specify 'Content-type' as one of the parameters to sendmail. By default it uses Latin-1 or something like that.