comment on

Hello there!

Whilst not being new to developing I am very new to Perl. I have "inherited" a project, and needed to extend it to allow for utf-8 data. I have pretty much completed this task, but there is one last remaining problem.

The software sends email to a list of users, the name could contain accented or other European or Asian characters. The user information is read from a mysql db.

The administrators of the software create an email template and use place holders like %NAME% which the code then replaces with the real name of the user. This is where my problem seems to be.

I use Encode to to encode the address and subject successfully for the email.:

$addr="$name <$email>";
use Encode qw/encode/;
$to = encode('MIME-Q', $to);
$subject = encode('MIME-Q', $subject);
[download]

The body of the email is presented in utf-8 no problem, the name that is replaced in the template is done so via :

$tmpbody=~s/\%NAME\%/$name/sg;
[download]

The problem is that the name when displayed in the body of the email contains the familiar question marks or diamonds where the utf-8 multibyte characters should be.

Some additional background, after MANY trawls through the web the db table's default charset is utf-8, I have checked the file used to hold the email template and verified it is utf-8 (using file file_name at a linux command prompt), I also have done $dbh->{'mysql_enable_utf8'} = 1; on the database connection. For good measure I also decode_utf($name) before using it in either the address or the string replace line.

Adding some additional debug info to the body of the email using :

$extra1 = DBI::data_string_desc($name);
$extra2 = DBI::data_string_desc($body);
$tmpbody=~s/\%NAME\%/$name."<br>".$extra1."<br>".$extra2/sg;
[download]

Shows that both the name and body have UTF8 :
Dear b�l�." ".UTF8 on, non-ASCII, 4 characters 6 bytes." ".UTF8 on, non-ASCII, 606 characters 619 bytes

I have tried lots of different remedies on the above from using $name = pack "U0C*", unpack "C*", $name; to using decode or encode in lots of different combinations, but once I reported the info above I stopped and scratched my head as it *looked* like everything should have worked!

Any help will be very much appreciated I've spent a LOT of hours trying to figure this out!

In reply to strange utf-8 (I think) behaviour... by seekay

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.