in reply to Do I have a unicode problem, or is this something else?

Hi All,

So that went very smoothly. Thanks to ikegami and Graff for pointing me in the right direction. However it also revealed that I have a similar problem with my file IO.

I have used the Wx::RichTextCtrl SaveFile() command which saves in XML. Characters with accents are saved in what looks like an octal format (eg Title or Título is T& # 2 3 7 ;tulo without the spaces). I tried use open ':encoding(utf8)'; which I thought would solve all my problems - it didn't. But I guess maybe & # 2 3 7 is not utf8. It doesn't look the same. Does anyone know what it is and how I should deal with it.

Regards

Steve.

Replies are listed 'Best First'.
Re^2: Do I have a unicode problem, or is this something else?
by ikegami (Patriarch) on Jun 10, 2010 at 18:20 UTC
    Unicode character 237 (decimal, not octal) = U+00ED = LATIN SMALL LETTER I WITH ACUTE = what you want = no problem.

      Hi ikegami,

      Thanks for that. So I understand that this is a decimal code, although I'm not sure what U+00ED means.

      a) Is there a function like the decode function which will parse a variable and replace these strings with the correct unicode characters?

      b) What is this style of encoding called so I can do a google on it.

      Regards

      Steve

        although I'm not sure what U+00ED means.

        Unicode character 00ED hex.

        What is this style of encoding called so I can do a google on

        XML. Specifically, it's an XML entity.

        Is there a function like the decode function which will parse a variable and replace these strings with the correct unicode characters?

        It is the correct unicode character.

        But if you wish to expand the entities, an easy way is to use XML::LibXML since it doesn't use entities unless required.

        use strict; use warnings; use XML::LibXML qw( ); my $xml = '<?xml version="1.0"?><root>&#237;</root>'; my $parser = XML::LibXML->new(); my $doc = $parser->parse_string($xml); $doc->setEncoding('UTF-8'); open(my $fh, '>:bytes', 'xml') or die; print($fh $doc->toString);