gepebril69 has asked for the wisdom of the Perl Monks concerning the following question:

Hi there

I'm trying to automate emails and using therefor HTML templates and parse the dynamic data into it. This goes well unless I use characters like é, ë, ï in some web email programs. It seems I have to translate/convert these special characters HTML entity names

I found HTML::Entities but it not seems to give the result I expect. I only want to convert the special characters, not HTML markup, like <font>. When I run

my $TestStr = 'ï'; print encode_entities($TestStr);
It returns &Atilde;&macr; in stead of &iuml;

Bug in my module, me not understanding the module?

yours sincerely,

Replies are listed 'Best First'.
Re: Encode string to HTML
by Corion (Patriarch) on Nov 01, 2013 at 14:43 UTC

    You need to (find out and) tell Perl what encoding your i-with-double-dots letter is in. Then you need to Encode::decode it and then pass it to HTML::Entities for output, or as an alternative, tell the mail client in the headers what output encoding your mail uses.

    Likely of help is perlunitut.

Re: Encode string to HTML
by hippo (Archbishop) on Nov 01, 2013 at 14:46 UTC

    You aren't decoding your source string first.

    use strict; use warnings; use HTML::Entities; use Encode; my $TestStr = 'ï'; print encode_entities($TestStr) . "\n"; print encode_entities(decode ('utf-8', $TestStr)) . "\n";

    Have a read of perlunitut if you haven't already. It'll explain the basics.

      decodeing from utf-8 only helps if the source code is actually encoded as UTF-8. This may or may not be the case.

      At least according to Wikipedia, likely encodings are also ISO 8859-3, ISO 8859-9 or Windows-1254, if guessing that &iuml is supposed to depict a Turkish letter.

        Indeed so - it is nigh on impossible to determine the encoding of a document from a single character, so the actual encoding of the source will only be known by gepebril69. UTF-8 seemed a reasonable first guess in this instance and it does produce the desired output for that one character.

        I've checked the template file and it is

        text/html; charset=utf-8

        Now I understand why I had a similar problem in the past with parsing files. Perl don't seem to auto detect this formatting. It will have a logical reason I guess

      Thanks hippo

      That is very much explaining, so in my case when I want to define unsafe characters I have to use a similar methode.

      my $UnsafeChar = 'ïé'; print encode_entities(decode ('utf-8', $TestStr), decode ('utf-8', $Un +safeChar)) . "\n";
Re: Encode string to HTML
by Your Mother (Archbishop) on Nov 01, 2013 at 16:31 UTC

    Also, note that your code needs to know its own encoding.

    use strict; use HTML::Entities; { my $TestStr = 'ï'; print encode_entities($TestStr), $/; } { use utf8; my $TestStr = 'ï'; print encode_entities($TestStr), $/; } __END__ &Atilde;&macr; &iuml;
Re: Encode string to HTML
by locked_user sundialsvc4 (Abbot) on Nov 01, 2013 at 15:03 UTC

      Thanks!

      I had to change this line of code to make my mail look OK on the webbrowsers I've tested

      $Mail{'content-type'} = 'text/html; charset="iso-8859-1"'; #Old, no +t correct $Mail{'content-type'} = 'text/html; charset="utf-8"';

      Thanks all for aiming me to the right direction