I've been using XML::Fast to process XML files for some time and successfully. However, the code was moved to a newer machine and has stopped working in some circumstances. A difference between the machines is XML::Fast version, 0.11 on original machine (working) and 0.17 on new machine (not working). When no other changes are made but to upgrade to 0.17 on the old machine it also stops working.

The error I'm getting is:

Failed to encode 2017-9-21T08-49-17.XML to JSON for indexing - malform +ed or illegal unicode character in string [�ndby IF], cannot c +onvert to JSON at xx.pm line 1827.

The XML file comes from a 3rd party and is ISO-8859-1 encoded. The bit it is complaining about is <Value>Br<F8>ndby IF</Value>. A cut down version of the XML which fails is:

<?xml version="1.0" encoding="ISO-8859-1"?> <xx feedtype="delta"><Timestamp CreatedTime="2017-09-21T06:49:17" Time +Zone="GMT"/><Value>Brøndby IF</Value></xx>

The code which is now failing is:

use Cpanel::JSON::XS; use XML::Fast; sub esIndexFile2 { my ($self, $file) = @_; my $xml = do { local $/ = undef; open (my $fh, "<:encoding(ISO-8859-1)", $file) or die "Failed +to open $file - $!"; <$fh>; }; $xml =~ s/^(?:.*\n)//; # remove first line - the encoding lin +e my $hash; eval { $hash = xml2hash $xml; }; if (my $ev = $@) { warn("Failed to parse file $file for indexing - $@ - SKIPPING" +); return; } my $json = eval { encode_json($hash); # <------------ fails here }; if (my $ev = $@) { $self->logwarn("Failed to encode $file to JSON for indexing - +$@ - SKIPPING"); return; } return 1; }

The changes file for XML::Fast is not too helpful. I have discovered adding utf8decode => 1 to the xml2hash makes it work now but I don't really understand why. I am doing anything wrong here? What might have changed in XML::Fast to cause this to happen?


In reply to Problem upgrading XML::Fast from 0.11 to 0.17 by mje

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.