seekay has asked for the wisdom of the Perl Monks concerning the following question:
Hello there!
Whilst not being new to developing I am very new to Perl. I have "inherited" a project, and needed to extend it to allow for utf-8 data. I have pretty much completed this task, but there is one last remaining problem.
The software sends email to a list of users, the name could contain accented or other European or Asian characters. The user information is read from a mysql db.
The administrators of the software create an email template and use place holders like %NAME% which the code then replaces with the real name of the user. This is where my problem seems to be.
I use Encode to to encode the address and subject successfully for the email.:
$addr="$name <$email>"; use Encode qw/encode/; $to = encode('MIME-Q', $to); $subject = encode('MIME-Q', $subject);
The body of the email is presented in utf-8 no problem, the name that is replaced in the template is done so via :
$tmpbody=~s/\%NAME\%/$name/sg;
The problem is that the name when displayed in the body of the email contains the familiar question marks or diamonds where the utf-8 multibyte characters should be.
Some additional background, after MANY trawls through the web the db table's default charset is utf-8, I have checked the file used to hold the email template and verified it is utf-8 (using file file_name at a linux command prompt), I also have done $dbh->{'mysql_enable_utf8'} = 1; on the database connection. For good measure I also decode_utf($name) before using it in either the address or the string replace line.
Adding some additional debug info to the body of the email using :
$extra1 = DBI::data_string_desc($name); $extra2 = DBI::data_string_desc($body); $tmpbody=~s/\%NAME\%/$name."<br>".$extra1."<br>".$extra2/sg;
Shows that both the name and body have UTF8 :
Dear b�l�."
".UTF8 on, non-ASCII, 4 characters 6 bytes."
".UTF8 on, non-ASCII, 606 characters 619 bytes
I have tried lots of different remedies on the above from using $name = pack "U0C*", unpack "C*", $name; to using decode or encode in lots of different combinations, but once I reported the info above I stopped and scratched my head as it *looked* like everything should have worked!
Any help will be very much appreciated I've spent a LOT of hours trying to figure this out!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: strange utf-8 (I think) behaviour...
by moritz (Cardinal) on Oct 21, 2008 at 09:28 UTC | |
by seekay (Initiate) on Oct 21, 2008 at 10:29 UTC | |
by moritz (Cardinal) on Oct 21, 2008 at 11:54 UTC | |
by seekay (Initiate) on Oct 24, 2008 at 02:11 UTC |