Mandor has asked for the wisdom of the Perl Monks concerning the following question:

This is basically a follow-up to my node here Parsing special chars from a XML file

Just like jaldhar suggested I tried to use HTML::Entities to overcome my problem.
But I already stumbled into a new one. After parsing the XML file (WITH special characters in it)
I apply the HTML::Entities encode_entities to the desired strings and then save them into a HTML file using HTML::Template.
But instead of getting the right unicode codes I am getting completely different ones.
For example instead of getting (ä) & a u m l ; I get (Ã) & A t i l d e ;

I then tried the following : I made a file and put a some ä chars in there (the char not the unicode).
When I opened that file in perl and printed it to screen I saw the à char.
But when I saved the string back to the file and opened it the chars were correct again (ä)

This leads me to the idea that encode_entities translated à to & A t i l d e ; instead of ä to & a u m l ;
because the internal representation of the char somehow got broken and held à instead of ä.

Any ideas what I could do?

PS : I am on Win32
  • Comment on Internal special char differ from output special char

Replies are listed 'Best First'.
Re: Internal special char differ from output special char
by jaldhar (Vicar) on Dec 09, 2001 at 23:01 UTC

    Hrmm... that's odd. Could it be the font you're using is not unicode but ISO-somethingortheother. (And windows has some even funkier character sets.) You can display the mapping tables like this:

    #!/usr/bin/perl -w use strict; use HTML::Entities qw(%char2entity %entity2char); foreach (keys %char2entity) { print "$_ = $char2entity{$_}\n"; } foreach (keys %entity2char) { print "$_ = $entity2char{$_}\n"; }

    On my Linux box everything seems to have the right values.

    Update: s/HTML:Entities/HTML::Entities/;

      The mapping table seems to be ok (I looked into Entities.pm manually since your code above doesn't seem to work for me - doesn't print out anything even when I corrected HTML:Entities to HTML::Entities).
      I also checked the char mapping table under Windows and my systemfont seems to be unicode.

      I am thankful for any further ideas.

        mandor, why don't you post some of your code. That might give us a clue as to what could be going wrong. Also give the version number of perl and HTML::Entities in case that makes a difference.