in reply to Re: How to check the encoding format of an XML
in thread How to check the encoding format of an XML

Hi,

Perhaps this is not the right forum to ask this (it depends how strict you are), but still I think it's useful for a perl programmer to know how to do this without perl.

The question is: is there any unix/linux command that tells me the encoding format of an xml file?

I've got xml files that don't claim any particular encoding (<?xml version="1.0" ?>). They are in UCS-2LE but I need to have them in UTF-8 or ANSI.

This time I could see the encoding opening them in an editor but it would be much handier to check from the command line. The "file" command only tells me "XML document text"

Cheers and thanks a lot!

xinelo

  • Comment on Re^2: How to check the encoding format of an XML

Replies are listed 'Best First'.
Re^3: How to check the encoding format of an XML
by bart (Canon) on May 02, 2010 at 21:09 UTC
    but I need to have them in UTF-8 or ANSI.
    Assuming the XML file is valid — and, since you posted this in a thread where I complained that people often produce invalid XML, that's not necessarily true — I think you can use XSLT, with an identity transform and thus make it produce XML in any encoding you like.
    The question is: is there any unix/linux command that tells me the encoding format of an xml file?
    Uh? Do you still need it, then? Anyway, if you don't mind a solution involving Perl, then Encode::Guess might do the trick.
Re^3: How to check the encoding format of an XML
by ikegami (Patriarch) on May 03, 2010 at 00:12 UTC

    They are in UCS-2LE but I need to have them in UTF-8 or ANSI.

    perl -pe' BEGIN { binmode STDIN, ":raw:perlio:encoding(UTF-16le)"; binmode STDOUT, ":raw:perlio:encoding(UTF-8)"; } ' < file.xml > file.utf8.xml

    (UTF-16le is a superset of UCS-2le, so it's safer to use it when decoding.)

    Update: Note that this doesn't fix the encoding= attribute of the <?xml?> directive. But it sounds like your trying to make the encoding match it anyway.

      Or using the open pragma
      perl -Mopen=:std,IN,:raw:perlio:encoding(UTF-16le),OUT,:raw:perlio:enc +oding(UTF-8) -pe 1 < file.xml > file.utf8.xml