in reply to Re: Do I have a unicode problem, or is this something else?
in thread Do I have a unicode problem, or is this something else?

Unicode character 237 (decimal, not octal) = U+00ED = LATIN SMALL LETTER I WITH ACUTE = what you want = no problem.
  • Comment on Re^2: Do I have a unicode problem, or is this something else?

Replies are listed 'Best First'.
Re^3: Do I have a unicode problem, or is this something else?
by Steve_BZ (Chaplain) on Jun 10, 2010 at 21:41 UTC

    Hi ikegami,

    Thanks for that. So I understand that this is a decimal code, although I'm not sure what U+00ED means.

    a) Is there a function like the decode function which will parse a variable and replace these strings with the correct unicode characters?

    b) What is this style of encoding called so I can do a google on it.

    Regards

    Steve

      although I'm not sure what U+00ED means.

      Unicode character 00ED hex.

      What is this style of encoding called so I can do a google on

      XML. Specifically, it's an XML entity.

      Is there a function like the decode function which will parse a variable and replace these strings with the correct unicode characters?

      It is the correct unicode character.

      But if you wish to expand the entities, an easy way is to use XML::LibXML since it doesn't use entities unless required.

      use strict; use warnings; use XML::LibXML qw( ); my $xml = '<?xml version="1.0"?><root>&#237;</root>'; my $parser = XML::LibXML->new(); my $doc = $parser->parse_string($xml); $doc->setEncoding('UTF-8'); open(my $fh, '>:bytes', 'xml') or die; print($fh $doc->toString);

        Hi ikegami,

        Thanks very much for your reply. I didn't have XML::LibXML installed on my PC (Kubuntu), so I went into cpan and installed it, but cpan is complaining:

        It says no Makfile, and it's right. So I went into the directory. There is a Makefile.PL, So I executed it and I got "Makefile.PL: command not found":

        root@steve-desktop:~/.cpan/build/XML-LibXML-1.70-XzsnvX# dir Av_CharPtrPtr.c Changes docs dom.h lib LibXML.pod LICEN +SE MANIFEST perl-libxml-mm.c perl-libxml-sax.c ppport.h t + TODO xpath.c xpath.h Av_CharPtrPtr.h debian dom.c example LibXML.pm LibXML.xs Makef +ile.PL META.yml perl-libxml-mm.h perl-libxml-sax.h README test + typemap xpathcontext.h root@steve-desktop:~/.cpan/build/XML-LibXML-1.70-XzsnvX# Makefile.PL Makefile.PL: command not found root@steve-desktop:~/.cpan/build/XML-LibXML-1.70-XzsnvX#

        I'm now looking for another Parser - maybe I could just use a regular expression?

        Update I've tried this regular expression and it seems to work.

        #!/usr/bin/perl -w use strict; use warnings; my $xml = '<?xml version="1.0"?><root>&#237;</root>'; print($xml,"\n"); $xml =~ s/\&\#(\d*);/chr($1)/gse; print($xml,"\n");

        So thanks again for pointing me in the right direction, ikegami, as always.

        Regards

        Steve