Re: Encode string to HTML

You aren't decoding your source string first.

use strict;
use warnings;

use HTML::Entities;
use Encode;

my $TestStr = 'ן';
print encode_entities($TestStr) . "\n";
print encode_entities(decode ('utf-8', $TestStr)) . "\n";
[download]

Have a read of perlunitut if you haven't already. It'll explain the basics.

Comment on Re: Encode string to HTML Download Code

Replies are listed 'Best First'.
Re^2: Encode string to HTML by Corion (Patriarch) on Nov 01, 2013 at 14:50 UTC
`decode`ing from `utf-8` only helps if the source code is actually encoded as UTF-8. This may or may not be the case. At least according to Wikipedia, likely encodings are also ISO 8859-3, ISO 8859-9 or Windows-1254, if guessing that `&iuml` is supposed to depict a Turkish letter.	[reply] [d/l] [select]
Re^3: Encode string to HTML by hippo (Archbishop) on Nov 01, 2013 at 15:06 UTC
Indeed so - it is nigh on impossible to determine the encoding of a document from a single character, so the actual encoding of the source will only be known by gepebril69. UTF-8 seemed a reasonable first guess in this instance and it does produce the desired output for that one character.	[reply]
Re^3: Encode string to HTML by gepebril69 (Scribe) on Nov 01, 2013 at 15:27 UTC
I've checked the template file and it is `text/html; charset=utf-8` Now I understand why I had a similar problem in the past with parsing files. Perl don't seem to auto detect this formatting. It will have a logical reason I guess	[reply] [d/l]
Re^2: Encode string to HTML by gepebril69 (Scribe) on Nov 01, 2013 at 15:22 UTC
Thanks hippo That is very much explaining, so in my case when I want to define unsafe characters I have to use a similar methode. `my $UnsafeChar = 'ןי'; print encode_entities(decode ('utf-8', $TestStr), decode ('utf-8', $Un +safeChar)) . "\n";` [download]	[reply] [d/l]