ultranerds has asked for the wisdom of the Perl Monks concerning the following question:
my @emails = split /\n/, q|foo <andy@bar.com> =?utf-8?B?UGF1c2UgRG9yw6ll?= <pausedoree@gggg.com> =?UTF-8?Q?Village_Bambous_=2D_Chambre_d=27_H=C3=B4tes?= <village.bambo +us@ddd.com> =?utf-8?B?YmVybmFyZCB2ZXJpdMOp?= <naturedetente@fdd.fr> =?ISO-8859-1?B?TGHrdGl0aWE=?= Picot <villagabrielle@ffsdfsd.net> =?iso-8859-1?Q?Ancie_chambres_d=27h=F4tes?= <ancie.ha@dfdd.fr>|; use utf8; use Encode qw(encode decode); foreach (@emails) { $_ = decode('MIME-Header', $_); print "FOO: $_\n"; print $IN->header; use Data::Dumper; print Dumper($_); print "FOO: " . utf8::is_utf8($_) . "\n"; if (utf8::is_utf8($_)) { print "content..\n"; $_ =~ s/([\200-\377]+)/from_utf8({ -string => $1, -cha +rset => 'ISO-8859-1'})/eg; } print Dumper($_); print "FOO: " . utf8::is_utf8($_) . "\n"; print "\n\n"; }
I'm a bit confused as to what encoding the string is in now though, as utf8::is_utf8($_) still seems to be giving me a positive, as to it being a utf8 string?FOO: foo <andy@bar.com> Content-type: text/html; charset=iso-8859-1 $VAR1 = 'foo <andy@bar.com>'; FOO: $VAR1 = 'foo <andy@bar.com>'; FOO: FOO: Pause Dorée <pausedoree@gggg.com> $VAR1 = "Pause Dor\x{e9}e <pausedoree\@gggg.com>"; FOO: 1 convert.. $VAR1 = "Pause Dor\x{e9}e <pausedoree\@gggg.com>"; FOO: 1 FOO: Village Bambous - Chambre d' Hôtes <village.bambous@ddd.com> $VAR1 = "Village Bambous - Chambre d' H\x{f4}tes <village.bambous\@ddd +.com>"; FOO: 1 convert.. $VAR1 = "Village Bambous - Chambre d' H\x{f4}tes <village.bambous\@ddd +.com>"; FOO: 1 FOO: bernard verité <naturedetente@fdd.fr> $VAR1 = "bernard verit\x{e9} <naturedetente\@fdd.fr>"; FOO: 1 convert.. $VAR1 = "bernard verit\x{e9} <naturedetente\@fdd.fr>"; FOO: 1 FOO: Laëtitia Picot <villagabrielle@ffsdfsd.net> $VAR1 = "La\x{eb}titia Picot <villagabrielle\@ffsdfsd.net>"; FOO: 1 convert.. $VAR1 = "La\x{eb}titia Picot <villagabrielle\@ffsdfsd.net>"; FOO: 1 FOO: Ancie chambres d'hôtes <ancie.ha@dfdd.fr> $VAR1 = "Ancie chambres d'h\x{f4}tes <ancie.ha\@dfdd.fr>"; FOO: 1 convert.. $VAR1 = "Ancie chambres d'h\x{f4}tes <ancie.ha\@dfdd.fr>"; FOO: 1
$_ = decode('MIME-Header', $_);
That seems to do it. Does that look OK? I just don't want to bugger it up :)if (utf8::is_utf8($the_from)) { $the_from = encode('iso-8859-1', $_); }
|
---|