wfsp has asked for the wisdom of the Perl Monks concerning the following question:

I've got into a tangle using HTML::Entities v 1.27

Using WinXP, Activestate 5.8.6 build 811

Any ideas on what I'm doing wrong?

#!/usr/bin/perl use strict; use warnings; use HTML::Entities; my $string = "’"; print "string: $string\n"; # &rsquo decode_entities($string); print "de entitied: $string\n"; # ’ print "ord of string:" . ord($string) . "\n"; # 226 encode_entities($string); print "re entitied: $string\n"; # â€&#153 my $character = encode_entities(chr(226)); print "character 226: $character\n"; # &acirc $character = encode_entities(chr(8217)); print "character 8217: $character\n"; # &rsquo

I think I need decode_entities to return char(8217) and not char(226)

Thanks in advance

Replies are listed 'Best First'.
Re: Need advice on HTML entities
by Tanktalus (Canon) on Feb 07, 2005 at 17:24 UTC

    Works better here ... although I'm getting a message of "Wide character in print at z line 10." Because perl is converting this to UTF8 successfully, the ord is coming out as 8217. The 226 you're getting is simply the first character in a multi-byte character because your perl is treating the multibyte character as a series of single byte characters.

    You may try something like this:

    C:\> set LANG=en_US.UTF-8 C:\> perl html_entities_test.pl
    I'm not sure how well it'll work for you, though.

Re: Need advice on HTML entities
by borisz (Canon) on Feb 07, 2005 at 17:39 UTC
    You could use
    HTML::Entities::decode_entities_old($string)
    thats the old perl fallback. Or make your string utf8 before you pass it to decode_entities.
    my $string = "’"; chop ( $string .= chr(0x1234)); print "string: $string\n"; # &rsquo decode_entities($string);
    and since we like to print a utf8 char put STDOUT into utf8 mode too.
    # at the top of the program binmode STDOUT, ":utf8";
    UPDATE: or even better update HTML::Parser and it works out of the box. Tested with 3.43.
    Boris
      ...update HTML::Parser...

      That did it! Many thanks,
      John