in reply to Re^4: Do I have a unicode problem, or is this something else?
in thread Do I have a unicode problem, or is this something else?

Hi ikegami,

Thanks very much for your reply. I didn't have XML::LibXML installed on my PC (Kubuntu), so I went into cpan and installed it, but cpan is complaining:

cpan[1]> install XML::LibXML CPAN: Storable loaded ok (v2.18) Going to read '/home/steve/.cpan/Metadata' Database was generated on Fri, 11 Jun 2010 14:29:21 GMT CPAN: YAML loaded ok (v0.71) Going to read 11 yaml files from /home/steve/.cpan/build/ CPAN: Time::HiRes loaded ok (v1.9711) DONE Restored the state of none (in 0.1883 secs) Running install for module 'XML::LibXML' Running make for P/PA/PAJAS/XML-LibXML-1.70.tar.gz CPAN: Digest::SHA loaded ok (v5.45) Checksum was ok Scanning cache /home/steve/.cpan/build for sizes ...................................................................... +......DONE CPAN: Compress::Zlib loaded ok (v2.02) + CPAN: Archive::Tar loaded ok (v1.38) + Will not use Archive::Tar, need 1.00 + XML-LibXML-1.70/ + XML-LibXML-1.70/lib/ + XML-LibXML-1.70/lib/XML/ + XML-LibXML-1.70/lib/XML/LibXML/ + XML-LibXML-1.70/lib/XML/LibXML/DOM.pod + XML-LibXML-1.70/lib/XML/LibXML/Reader.pm + XML-LibXML-1.70/lib/XML/LibXML/InputCallback.pod + XML-LibXML-1.70/lib/XML/LibXML/SAX/ + XML-LibXML-1.70/lib/XML/LibXML/SAX/Builder.pod + XML-LibXML-1.70/lib/XML/LibXML/SAX/Parser.pm + XML-LibXML-1.70/lib/XML/LibXML/SAX/Builder.pm + XML-LibXML-1.70/lib/XML/LibXML/SAX/Generator.pm + XML-LibXML-1.70/lib/XML/LibXML/Common.pod + XML-LibXML-1.70/lib/XML/LibXML/XPathExpression.pod + XML-LibXML-1.70/lib/XML/LibXML/Parser.pod + XML-LibXML-1.70/lib/XML/LibXML/Text.pod + XML-LibXML-1.70/lib/XML/LibXML/RegExp.pod + XML-LibXML-1.70/lib/XML/LibXML/ErrNo.pod + XML-LibXML-1.70/lib/XML/LibXML/Document.pod + XML-LibXML-1.70/lib/XML/LibXML/CDATASection.pod + XML-LibXML-1.70/lib/XML/LibXML/Reader.pod + XML-LibXML-1.70/lib/XML/LibXML/Comment.pod + XML-LibXML-1.70/lib/XML/LibXML/Number.pm + XML-LibXML-1.70/lib/XML/LibXML/Node.pod + XML-LibXML-1.70/lib/XML/LibXML/SAX.pod + XML-LibXML-1.70/lib/XML/LibXML/XPathContext.pm + XML-LibXML-1.70/lib/XML/LibXML/Boolean.pm + XML-LibXML-1.70/lib/XML/LibXML/Schema.pod + XML-LibXML-1.70/lib/XML/LibXML/Namespace.pod + XML-LibXML-1.70/lib/XML/LibXML/ErrNo.pm + XML-LibXML-1.70/lib/XML/LibXML/PI.pod + XML-LibXML-1.70/lib/XML/LibXML/Error.pod + XML-LibXML-1.70/lib/XML/LibXML/Dtd.pod + XML-LibXML-1.70/lib/XML/LibXML/Common.pm + XML-LibXML-1.70/lib/XML/LibXML/Error.pm + XML-LibXML-1.70/lib/XML/LibXML/NodeList.pm + XML-LibXML-1.70/lib/XML/LibXML/DocumentFragment.pod + XML-LibXML-1.70/lib/XML/LibXML/XPathContext.pod + XML-LibXML-1.70/lib/XML/LibXML/Attr.pod + XML-LibXML-1.70/lib/XML/LibXML/RelaxNG.pod + XML-LibXML-1.70/lib/XML/LibXML/SAX.pm + XML-LibXML-1.70/lib/XML/LibXML/Literal.pm + XML-LibXML-1.70/lib/XML/LibXML/Element.pod + XML-LibXML-1.70/lib/XML/LibXML/Pattern.pod + XML-LibXML-1.70/Changes + XML-LibXML-1.70/example/ + XML-LibXML-1.70/example/test.html + XML-LibXML-1.70/example/complex/ + XML-LibXML-1.70/example/complex/dtd/ + XML-LibXML-1.70/example/complex/dtd/g.dtd + XML-LibXML-1.70/example/complex/dtd/f.dtd + XML-LibXML-1.70/example/complex/complex2.xml + XML-LibXML-1.70/example/complex/complex.xml + XML-LibXML-1.70/example/complex/complex.dtd + XML-LibXML-1.70/example/test.dtd + XML-LibXML-1.70/example/article_internal_bad.xml + XML-LibXML-1.70/example/dtd.xml + XML-LibXML-1.70/example/utf-16-1.html + XML-LibXML-1.70/example/enc2_latin2.html + XML-LibXML-1.70/example/ns.xml + XML-LibXML-1.70/example/article_external_bad.xml + XML-LibXML-1.70/example/xmlns/ + XML-LibXML-1.70/example/xmlns/badguy.xml + XML-LibXML-1.70/example/xmlns/goodguy.xml + XML-LibXML-1.70/example/article.xml + XML-LibXML-1.70/example/xmllibxmldocs.pl + XML-LibXML-1.70/example/test4.xml + XML-LibXML-1.70/example/enc_latin2.html + XML-LibXML-1.70/example/test3.xml + XML-LibXML-1.70/example/ext_ent.dtd + XML-LibXML-1.70/example/article_internal.xml + XML-LibXML-1.70/example/catalog.xml + XML-LibXML-1.70/example/xpath.pl + XML-LibXML-1.70/example/utf-16-2.html + XML-LibXML-1.70/example/cb_example.pl + XML-LibXML-1.70/example/test.xhtml + XML-LibXML-1.70/example/test.xml + XML-LibXML-1.70/example/bad.dtd + XML-LibXML-1.70/example/bad.xml + XML-LibXML-1.70/example/article_bad.xml + XML-LibXML-1.70/example/dromeds.xml + XML-LibXML-1.70/example/test2.xml + XML-LibXML-1.70/example/utf-16-2.xml + XML-LibXML-1.70/test/ + XML-LibXML-1.70/test/schema/ + XML-LibXML-1.70/test/schema/demo.xml + XML-LibXML-1.70/test/schema/badschema.xsd + XML-LibXML-1.70/test/schema/schema.xsd + XML-LibXML-1.70/test/schema/invaliddemo.xml + XML-LibXML-1.70/test/relaxng/ + XML-LibXML-1.70/test/relaxng/invaliddemo.xml + XML-LibXML-1.70/test/relaxng/demo.rng + XML-LibXML-1.70/test/relaxng/badschema.rng + XML-LibXML-1.70/test/relaxng/demo.xml + XML-LibXML-1.70/test/relaxng/schema.rng + XML-LibXML-1.70/test/relaxng/demo3.rng + XML-LibXML-1.70/test/relaxng/demo2.rng + XML-LibXML-1.70/test/relaxng/demo4.rng + XML-LibXML-1.70/test/xinclude/ + XML-LibXML-1.70/test/xinclude/xinclude.xml + XML-LibXML-1.70/test/xinclude/entity.txt + XML-LibXML-1.70/test/xinclude/test.xml + XML-LibXML-1.70/test/textReader/ + XML-LibXML-1.70/test/textReader/countries.xml + XML-LibXML-1.70/debian/ + XML-LibXML-1.70/debian/rules + XML-LibXML-1.70/debian/copyright + XML-LibXML-1.70/debian/libxml-libxml-perl.postinst + XML-LibXML-1.70/debian/control + XML-LibXML-1.70/debian/libxml-libxml-perl.install + XML-LibXML-1.70/debian/changelog + XML-LibXML-1.70/debian/libxml-libxml-perl.examples + XML-LibXML-1.70/debian/libxml-libxml-perl.docs + XML-LibXML-1.70/debian/libxml-libxml-perl.prerm + XML-LibXML-1.70/debian/compat + XML-LibXML-1.70/t/ + XML-LibXML-1.70/t/25relaxng.t + XML-LibXML-1.70/t/40reader.t + XML-LibXML-1.70/t/32xpc_variables.t + XML-LibXML-1.70/t/10ns.t + XML-LibXML-1.70/t/14sax.t + XML-LibXML-1.70/t/06elements.t + XML-LibXML-1.70/t/20extras.t + XML-LibXML-1.70/t/04node.t + XML-LibXML-1.70/t/03doc.t + XML-LibXML-1.70/t/18docfree.t + XML-LibXML-1.70/t/07dtd.t + XML-LibXML-1.70/t/45regex.t + XML-LibXML-1.70/t/24c14n.t + XML-LibXML-1.70/t/23rawfunctions.t + XML-LibXML-1.70/t/12html.t + XML-LibXML-1.70/t/19encoding.t + XML-LibXML-1.70/t/11memory.t + XML-LibXML-1.70/t/43options.t + XML-LibXML-1.70/t/27new_callbacks_simple.t + XML-LibXML-1.70/t/01basic.t + XML-LibXML-1.70/t/80registryleak.t + XML-LibXML-1.70/t/17callbacks.t + XML-LibXML-1.70/t/26schema.t + XML-LibXML-1.70/t/61error.t + XML-LibXML-1.70/t/44extent.t + XML-LibXML-1.70/t/13dtd.t + XML-LibXML-1.70/t/21catalog.t + XML-LibXML-1.70/t/30xpathcontext.t + XML-LibXML-1.70/t/16docnodes.t + XML-LibXML-1.70/t/08findnodes.t + XML-LibXML-1.70/t/42common.t + XML-LibXML-1.70/t/31xpc_functions.t XML-LibXML-1.70/t/60struct_error.t XML-LibXML-1.70/t/41xinclude.t XML-LibXML-1.70/t/02parse.t XML-LibXML-1.70/t/05text.t XML-LibXML-1.70/t/15nodelist.t XML-LibXML-1.70/t/28new_callbacks_multiple.t XML-LibXML-1.70/t/90threads.t XML-LibXML-1.70/t/09xpath.t XML-LibXML-1.70/t/29id.t XML-LibXML-1.70/xpath.c XML-LibXML-1.70/ppport.h XML-LibXML-1.70/xpath.h XML-LibXML-1.70/README XML-LibXML-1.70/perl-libxml-sax.c XML-LibXML-1.70/dom.c XML-LibXML-1.70/LibXML.pod XML-LibXML-1.70/LibXML.pm XML-LibXML-1.70/perl-libxml-sax.h XML-LibXML-1.70/Makefile.PL XML-LibXML-1.70/perl-libxml-mm.h XML-LibXML-1.70/Av_CharPtrPtr.h XML-LibXML-1.70/MANIFEST XML-LibXML-1.70/TODO XML-LibXML-1.70/typemap XML-LibXML-1.70/xpathcontext.h XML-LibXML-1.70/Av_CharPtrPtr.c XML-LibXML-1.70/dom.h XML-LibXML-1.70/docs/ XML-LibXML-1.70/docs/libxml.dbk XML-LibXML-1.70/LICENSE XML-LibXML-1.70/perl-libxml-mm.c XML-LibXML-1.70/LibXML.xs XML-LibXML-1.70/META.yml CPAN: File::Temp loaded ok (v0.21) No 'Makefile' created , won't make Running make test Make had some problems, won't test Running make install Make had some problems, won't install

It says no Makfile, and it's right. So I went into the directory. There is a Makefile.PL, So I executed it and I got "Makefile.PL: command not found":

root@steve-desktop:~/.cpan/build/XML-LibXML-1.70-XzsnvX# dir Av_CharPtrPtr.c Changes docs dom.h lib LibXML.pod LICEN +SE MANIFEST perl-libxml-mm.c perl-libxml-sax.c ppport.h t + TODO xpath.c xpath.h Av_CharPtrPtr.h debian dom.c example LibXML.pm LibXML.xs Makef +ile.PL META.yml perl-libxml-mm.h perl-libxml-sax.h README test + typemap xpathcontext.h root@steve-desktop:~/.cpan/build/XML-LibXML-1.70-XzsnvX# Makefile.PL Makefile.PL: command not found root@steve-desktop:~/.cpan/build/XML-LibXML-1.70-XzsnvX#

I'm now looking for another Parser - maybe I could just use a regular expression?

Update I've tried this regular expression and it seems to work.

#!/usr/bin/perl -w use strict; use warnings; my $xml = '<?xml version="1.0"?><root>&#237;</root>'; print($xml,"\n"); $xml =~ s/\&\#(\d*);/chr($1)/gse; print($xml,"\n");

So thanks again for pointing me in the right direction, ikegami, as always.

Regards

Steve

Replies are listed 'Best First'.
Re^6: Do I have a unicode problem, or is this something else?
by ikegami (Patriarch) on Jun 11, 2010 at 18:41 UTC

    Two major problems:

    • You didn't encode the character using the XML's encoding before inserting it in the XML. You didn't even check if the XML's encoding could encode the character.
    • You're decoding entities that were encoded because the character they represent would break the XML if present. (e.g. &#38;).

    Your solution also has some potential bugs.

    • You convert what appears to be entities in CDATA sections. (Your XML generator might not produce these.)
    • You don't expand &iacute;. (Your XML generator might not produce these.)
    • You don't expand &#xED;. (Your XML generator might not produce these.)
    • The XML is encoded, but you try to match against it as if it was text. (You're matching ASCII character and your XML generator might always use an ASCII-derived encoding).

    On the stylistic side,

    • \d is way too encompassing. You want [0-9]. (I'm listing this as stylistic since it won't be an issue with valid XML.)
    • You have a useless modifier on your match operator.
    • You have useless escapes in your pattern.

    Update: Added second major problem.

      Hi ikegami,

      Thanks for this. I'll start with the end first.

      The style points: I'm never quite sure which characters need escape sequences and which don't, so thanks for the clarifications there. All good points and I'll incorporate them.

      Potential bugs: it's true. Not to mention '& amp' etc. But I don't expect to see these here, and if I do, they'll come up in testing. I'm not sure I understand the last point.

      Intro: I don't have any control over the generation. It's done for me, so I can't do it any other way (unless you think I can).

      Have a good day.

      Regards

      Steve

        Not to mention '& amp' etc

        Which brings up another major bug: you're decoding entities you shouldn't. &#38;, among others.

        I don't have any control over the generation. It's done for me, so I can't do it any other way

        I know. That's not relevant to anything I said. Unless you know the generator behaves in the specified ways, *your* code has the specified bugs.

Re^6: Do I have a unicode problem, or is this something else?
by Corion (Patriarch) on Jun 11, 2010 at 18:27 UTC

    Most likely, Makefile.PL is not executable. You are supposed to run it like this:

    perl -w Makefile.PL

      Hi Corion,

      Nice to hear from you. I think that ikegami is probably right and I need to do this XML thing properly. I tried the command line you suggested and it gave me this error:

      steve@steve-desktop:~/.cpan/build/XML-LibXML-1.70-XzsnvX$ perl -w Make +file.PL Name "main::is_win32" used only once: possible typo at Makefile.PL lin +e 263. enable native perl UTF8 running xml2-config... using fallback values for LIBS and INC options: LIBS='-L/usr/local/lib -L/usr/lib -lxml2 -lm' INC='-I/usr/local/include -I/usr/include' If this is wrong, Re-run as: $ /usr/bin/perl Makefile.PL LIBS='-L/path/to/lib' INC='-I/path/to/in +clude' looking for -lxml2... no looking for -llibxml2... no libxml2 not found Try setting LIBS and INC values on the command line Or get libxml2 from http://xmlsoft.org/ If you install via RPMs, make sure you also install the -devel RPMs, as this is where the headers (.h files) are. Also, you may try to run perl Makefile.PL with the DEBUG=1 parameter to see the exact reason why the detection of libxml2 installation failed or why Makefile.PL was not able to compile a test program.

      Maybe this only runs on Windows? What do you think?

      Regards

      Steve

        You will need to have the XML library header files installed. Either install the "development" package through your OS package manager, or do as the output of Makefile.PL suggests and download and compile the libraries yourself. Also:

        Also, you may try to run perl Makefile.PL with the DEBUG=1 parameter to see the exact reason why the detection of libxml2 installation failed or why Makefile.PL was not able to compile a test program.
        You need to install the underlying library (libxml2) first. Your distro's repo should have it. Be sure to install any associated -dev/-devel package. In Debian, the packages are named libxml2 and libxml2-dev.