sapphirecat has asked for the wisdom of the Perl Monks concerning the following question:
Update #2, solved: It turns out that Email::MIME is encoding aware, I just missed it in my haste to use it like MIME::Lite. Also, I stumbled over Email::MIME::CreateHTML - I don't need it myself, but it looks like another interesting choice for the problem of constructing email.
There are a bunch of method pairs of the form foo and foo_str, which do the same thing, except that foo takes an encoded string as-is; in contrast, foo_str takes a decoded string, and encodes it for you (if you have a charset and encoding chosen).
Now, a code demonstration of where I've come to:
#!/usr/bin/perl -w # For correct results, use on a terminal expecting utf-8 use strict; use Email::MIME; sub new_part ($$) { my ($type, $body) = @_; my ($part) = Email::MIME->create( attributes => { content_type => $type, charset => 'utf-8', encoding => '8bit', }, body_str => $body, ); do { $part->header_set($_) } foreach qw/Date MIME-Version/; return $part; } # Normally, this would be actual utf-8 under "use utf8" my $text = "Not latin: \x{30ab}\x{30bf}\x{30ab}\x{30ca}"; my $html = "<p><i>$text</i></p>"; my $m = Email::MIME->create(header_str => [ To => 'a@example.com', From => 'b@example.com', Subject => 'Test', ], attributes => { content_type => 'multipart/alternative', }, parts => [ new_part('text/html', $html), new_part('text/plain', $text) ]); print $m->as_string;
Original question follows:
O Monks, I want to generate MIME email. I want to hand Unicode strings into the generator, and I want to get a byte string (an encoded string) back when I call as_string() or equivalent. I believe this is the only sensible thing to do, since the Content-Type defines a byte encoding of the data, and this byte encoding MAY vary by part, per each part's individual Content-Type specification. Thus, trying to encode a Unicode string returned from as_string() will produce the wrong result.
Neither MIME::Lite (basically deprecated now) nor Email::MIME seem to fit this desire. They issue a "wide char in print" warning on a :raw filehandle for my test with embedded katakana, which means they returned a non-encoded Unicode string, as I understand it. The documentation for MIME::Entity does not look promising either. Is there something else to try, or shall I give up and have callers lovingly byte-encode everything (and set its charset) on the way into Email::MIME? (I haven't even started looking at properly encoding headers like Subject yet; advice there would also be appreciated.)
Some broader context about what I'm trying to achieve, in case I'm doing it beyond wrong: I want to either pass the email to encode_base64 for packing into an Amazon SES API call, or I want to give it to /usr/sbin/sendmail -oi, most likely via :raw filehandle, if SES is over quota. encode_base64 is only defined over byte strings, so I need a correctly-encoded byte string regardless. (I want to use the SES API over their SMTP support so that I can get better errors, and check the quota/rate limit in advance.)
Update: some code follows, per request by anonymous.
#!/usr/bin/perl -W # vim:fileencoding=utf-8 # PuTTY option: Remote character set = UTF-8 # my locale: en_US.UTF-8 (LANG and all LC_* except LC_ALL="") use warnings; use strict; use utf8; use MIME::Lite; use MIME::Base64; my $m = MIME::Lite->new(To => 'a@example.com', From => 'b@example.net', Subject => 'Test', Type => 'TEXT', # Perlmonks safe encoding with same result Data => "Not latin: \x{30ab}\x{30bf}\x{30ab}\x{30ca}\n"); my $s = $m->as_string; print "UTF-8 flag: ", utf8::is_utf8($s), "\n"; binmode(STDOUT, ':raw'); print $s; # warns: wide char in print print encode_base64($s); # dies: wide char in sub entry
If I set stdout to ':utf8', then perl knows how to encode the Unicode string $s for printing, and the warning goes away. If I use Encode; print encode_base64(encode_utf8($s)); then that prevents encode_base64 from dying. However, this would improperly encode any text/* part that had a non-utf8 charset defined, including those parts which have the charset undefined, which is the default.
One last thing: when MIME::Lite talks about "this module will encode your message data for you" it means Content-Transfer-Encoding, binary/7bit/8bit etc. Nothing to do with character encoding.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Encoding/charset Aware MIME Email Generation
by Anonymous Monk on Jan 18, 2012 at 03:40 UTC | |
|
Re: Encoding/charset Aware MIME Email Generation
by Corion (Patriarch) on Jan 18, 2012 at 13:50 UTC | |
by sapphirecat (Acolyte) on Jan 18, 2012 at 20:11 UTC |