http://qs1969.pair.com?node_id=254272


in reply to Latin-1 characters and XML

I'm using XML::Parser with 5.005_3 and I've got Latin-1 and UTF-8 (for double-byte). (This is all in a production system.)

------
We are the carpenters and bricklayers of the Information Age.

Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Replies are listed 'Best First'.
Re: Re: Latin-1 characters and XML
by kilinrax (Deacon) on Apr 30, 2003 at 21:13 UTC
    And what do you use to output non-ascii characters as xml-escaped numeric entities? (e.g. 'ô' -> 'ô')
    Is there an XML::Parser method to do this (if there is one, it's completely undocumented afaict), or do you use a seperate module?
      What reader method do I use to write XML-escaped entities?!? Think about that for a second. XML::Parser reads the XML. It doesn't write it.

      You want some XML writer, of which there are many. And, yes, they will work with Latin-1. Another option is to use something like Unicode::String.

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

        It doesn't to me seem entirely implausible that a module that can convert escaped, latin-1 characters into unicode could also manage the reverse process. It's a fairly reasonable thing to want to do, after all.
        Incidentally, I did try looking at XML::Writer (which I'd assume is one of the many), and it didn't seem to have a method to do this either.
        As it happened, I ended up using Unicode::String and a regex ( 's|([\200-\377])|sprintf("&#%i;", ord($1))|ge' ), which works, afaict.