Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

What are you expecting XML to be in?

by John M. Dlugosz (Monsignor)
on Jun 03, 2001 at 21:46 UTC ( #85338=note: print w/replies, xml ) Need Help??

in reply to Converting character encodings

I read that XML was always in Unicode. Specifically, encoding was always UTF-8 or UTF-16. Has this been changed since that book was printed, or do people just do it anyway since the attribute is there?

IAC, the problem of converting from UTF-8 (internal to the script) to whatever encoding the caller wants is rather general.

Replies are listed 'Best First'.
Re: What are you expecting XML to be in?
by merlyn (Sage) on Jun 03, 2001 at 21:56 UTC
      That's a proper subset of UTF-8, so not really necessary. Can a particular XML file be represented in, say, 8859-6 or JIS-X, and still be standard? I don't like this because it means that a file can't be read unless the parser knows that character set.
        7-bit ISO-8859-1 (also called "ASCII" {grin}) is a proper subset of UTF-8, but not 8-bit ISO-8859-1. So yes, you'd need to declare the file as ISO-8859-1 if you wanted to have any "second half" characters, but otherwise you can let it default to UTF-8.

        -- Randal L. Schwartz, Perl hacker

Re: What are you expecting XML to be in?
by mirod (Canon) on Jun 03, 2001 at 22:29 UTC

    Actually XML uses UTF-8 or UTF-16 by default (and has ways to figure out which one is used), but allows any encoding, as long as it is specified in the XML declaration (as <?xml version="1.0" encoding="whatever"?>). The parser then has to deal with the encoding.

    It is an implementation choice in expat (and then in XML::Parser) that all strings are passed to the handlers in UTF-8, but I don't think the XML spec mandates this choice.

    And because the environment in which the XML is used often does not support UTF-8, but rather latin 1 or shift-JIS or whatever it is often very important (and painful!) to convert all strings back to their original encoding.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://85338]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2023-06-04 01:19 GMT
Find Nodes?
    Voting Booth?
    How often do you go to conferences?

    Results (17 votes). Check out past polls.