Hello there!

Whilst not being new to developing I am very new to Perl. I have "inherited" a project, and needed to extend it to allow for utf-8 data. I have pretty much completed this task, but there is one last remaining problem.

The software sends email to a list of users, the name could contain accented or other European or Asian characters. The user information is read from a mysql db.

The administrators of the software create an email template and use place holders like %NAME% which the code then replaces with the real name of the user. This is where my problem seems to be.

I use Encode to to encode the address and subject successfully for the email.:

$addr="$name <$email>"; use Encode qw/encode/; $to = encode('MIME-Q', $to); $subject = encode('MIME-Q', $subject);

The body of the email is presented in utf-8 no problem, the name that is replaced in the template is done so via :

$tmpbody=~s/\%NAME\%/$name/sg;

The problem is that the name when displayed in the body of the email contains the familiar question marks or diamonds where the utf-8 multibyte characters should be.

Some additional background, after MANY trawls through the web the db table's default charset is utf-8, I have checked the file used to hold the email template and verified it is utf-8 (using file file_name at a linux command prompt), I also have done $dbh->{'mysql_enable_utf8'} = 1; on the database connection. For good measure I also decode_utf($name) before using it in either the address or the string replace line.

Adding some additional debug info to the body of the email using :

$extra1 = DBI::data_string_desc($name); $extra2 = DBI::data_string_desc($body); $tmpbody=~s/\%NAME\%/$name."<br>".$extra1."<br>".$extra2/sg;

Shows that both the name and body have UTF8 :
Dear b�l�." ".UTF8 on, non-ASCII, 4 characters 6 bytes." ".UTF8 on, non-ASCII, 606 characters 619 bytes

I have tried lots of different remedies on the above from using $name = pack "U0C*", unpack "C*", $name; to using decode or encode in lots of different combinations, but once I reported the info above I stopped and scratched my head as it *looked* like everything should have worked!

Any help will be very much appreciated I've spent a LOT of hours trying to figure this out!


In reply to strange utf-8 (I think) behaviour... by seekay

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.