in reply to Need some encoding help

jalewis2:

XML::Simple will also take a string instead of a filehandle, so you could slurp in the file, do a quick patch of the XML header to correct the encoding specification and then pass the text to XML::Simple.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^2: Need some encoding help
by jalewis2 (Monk) on Dec 03, 2010 at 01:07 UTC

    The issue is that the xml claims to be UTF8, but it has ISO-8859-1 characters. I've successfully converted it from the command line, but when I try to put the gzip, conversion and processing together, it breaks.

    It didn't dawn on me that changing the xml header might tell XML::Simple to read the file differently. Is that what you're suggesting?

      jalewis2:

      Exactly. I'm thinking that you may be able to update the xml header to add the proper encoding attribute, something like <?xml version='1.0' encoding='ISO-8859-1' ?> as described in the Section 4.3 "Parsed Entities" in the XML specification. I don't know if this trick will work, as I've never used it. But hopefully, it'll get you past your hurdle. I'm not very knowledgeable about XML, so

      Another link I use when trying to debug XML problems is: The W3C XML Markup Validation Service. That's where I point people to when they tell me I'm wrong about their file being incorrect.

      ...roboticus

      When your only tool is a hammerJava, all problems look like your thumbXML.