kathys39 has asked for the wisdom of the Perl Monks concerning the following question:

I'm parsing an xml file with libxml. It's got lots of special characters in it. It has the line at the top declarings its encoding type: encoding="ISO-8859-1" I figured I would try and convert it to utf-8 before I parse and put into a mysql db, but this is not working. My code is below - what am I doing wrong?
$converted=Text::Iconv->new("ISO-8859-1", "UTF-8"); $confile = $converter->convert($file); open(INFILE, $confile)..... ..... parse the file...
I still end up with errors on the special chars - ending tag mismatch, error parsing attribute name, etc.

Replies are listed 'Best First'.
Re: parse xml file using libxml - and Text:Iconv
by ikegami (Patriarch) on May 29, 2009 at 19:17 UTC

    I don't see where you fixed the encoding declaration. You're telling your parser the UTF-8 document is encoded using iso-8859-1. Errors are to be expected.

    I figured I would try and convert it to utf-8 before I parse

    I don't see why you would want to do this extra work — you're now doing "decode-encode-decode" instead of just "decode" — and make needless assumptions about the original encoding.

Re: parse xml file using libxml - and Text:Iconv
by Jenda (Abbot) on May 29, 2009 at 22:48 UTC

    You should show us an example of the XML. The errors do not look like they are related to the encoding.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      OK understand other comments - I am new to the encoding/decoding world. I tried just using decode, but it still did not work. Here is a sample of the xml - I have NO control over how this stuff is output - my client gives me the xml and I need to parse it. My errors are on the "^Ls3weNp2Y" input - when I initially try to decode it:
      $confile = decode("iso-8859-1", $file); my $parser = XML::LibXML->new(); my $doc = $parser->parse_file($confile);
      Here is xml file...
      <?xml version="1.0" encoding="ISO-8859-1" standalone="yes" ?> <?xml-stylesheet type="text/xsl" href="notes.xsl"?> <UPDATE> <COMMANDPARMS> </COMMANDPARMS> <TIMING> <CUSTID> 192343-13</CUSTID> <STARTDATE>20090414</STARTDATE><STARTTIME>10:36:46</STARTTIME> </TIMING> <TMMR><DATE>20090414</DATE><TIME>18:40:39</TIME><CEO_ID>12943-56A</CEO +_ID><CEO_ID_SORTED>217.064.098.066</CEO_ID_SORTED><MANUID>80</MANUID> +<STATUS>Open</STATUS><NOTES><![CDATA[Last checked on 14Aug ]]></NOTES><SERVICE_INFO><![CDATA[:^Ls3weNp2Y ]]></SERVICE_INFO><CONTENT1><![CDATA[]]></CONTENT1><CONTENT2><![CDATA[ +]]></CONTENT2></TMMR>

        If I add the missing </UPDATE> at the end, the XML parses fine:

        use XML::Simple qw(XMLin); $data = XMLin(\*DATA); use Data::Dumper; print Dumper($data); __DATA__ <?xml version="1.0" encoding="ISO-8859-1" standalone="yes" ?> <?xml-stylesheet type="text/xsl" href="notes.xsl"?> <UPDATE> <COMMANDPARMS> </COMMANDPARMS> <TIMING> <CUSTID> 192343-13</CUSTID> <STARTDATE>20090414</STARTDATE><STARTTIME>10:36:46</STARTTIME> </TIMING> <TMMR><DATE>20090414</DATE><TIME>18:40:39</TIME><CEO_ID>12943-56A</CEO +_ID><CEO_ID_SORTED>217.064.098.066</CEO_ID_SORTED><MANUID>80</MANUID> +<STATUS>Open</STATUS><NOTES><![CDATA[Last checked on 14Aug ]]></NOTES><SERVICE_INFO><![CDATA[:^Ls3weNp2Y ]]></SERVICE_INFO><CONTENT1><![CDATA[]]></CONTENT1><CONTENT2><![CDATA[ +]]></CONTENT2></TMMR> </UPDATE>
        Stop trying to encode, decode or whatever! And make sure you do not pass string containing the XML to a method that expects the path to the file containing the XML or vice versa.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.