Woulfe has asked for the wisdom of the Perl Monks concerning the following question:

This script:
use XML::Simple; my $p= XML::Simple->new(); my $doc= $p->XMLin(shift);
... will fail given certain characters in the XML document. This is not a problem. The problem is that when it fails it will continue to print out the same error message again and again until the process is stopped. Here is a sample XMl file that will fail:
http://www.blahfoo.com/stuff/podtech.rss
The error message is:
uft8 "\xE2" does not map to Unicode at /usr/lib/perl5/site_perl/5.8.6/ +XML/SAX/PurePerl/Reader/Stream.ps line 37
and that line of code is the read() function. I'm certain that (after reading as much of the utf8/unicode documentation that I can stomach) the error message was due to a bad mapping into Unicode. However, I can't figure what is causing it to simply do this over and over. I just need to get around it. I read about this same issue (with a different hex code for the offending "character") somewhere else. No one seemed to have anything to offer other than preprocess the file. This is not really an option. Ideas?

Replies are listed 'Best First'.
Re: Infinite loop in XML Parser?
by Joost (Canon) on Feb 15, 2007 at 20:26 UTC
    I just get this:
    utf8 "\xE2" does not map to Unicode at /usr/local/lib/perl5/site_perl/ +5.8.5/XML/SAX/PurePerl/Reader/Stream.pm line 54. utf8 "\x80" does not map to Unicode at /usr/local/lib/perl5/site_perl/ +5.8.5/XML/SAX/PurePerl/Reader/Stream.pm line 54.
    and then the program stops.

    that's with perl 5.8.8, XML::Simple 2.12 and this input (taken from your URL, might have changed in the meantime, ofcourse):

      My version of XML::Simple is 2.16 though my perl is 5.8.6 Anyone have any idea why mine would hang on that first error message and never finish up. Clearly the previous reply is working properly. There must be something odd about my setup that is making this happen?
        Update XML::SAX::PurePerl
Re: Infinite loop in XML Parser?
by Anonymous Monk on Feb 15, 2007 at 21:31 UTC
    D:\>wget http://www.blahfoo.com/stuff/podtech.rss --13:21:16-- http://www.blahfoo.com/stuff/podtech.rss => `podtech.rss' Resolving www.blahfoo.com... 216.75.2.95 Connecting to www.blahfoo.com|216.75.2.95|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 85,882 (84K) [text/xml] 100%[====================================>] 85,882 5.99K/s +ETA 00:00 13:21:38 (4.60 KB/s) - `podtech.rss' saved [85882/85882] D:\>perl -MXML::Simple -e" XML::Simple->new->XMLin(shift)" podtech.rss not well-formed (invalid token) at line 866, column 134, byte 73867 at + D:/Perl/site/lib/XML/Simple.pm line 287
      CPAN tells me that XML::SAX::PurePerl is up to date. When I run the previous command:
      perl -MXML::Simple -e" XML::Simple->new->XMLin(shift)" podtech.rss
      ... I get a repeating sequence as I described before. Clearly I have something out of date or some odd combination of things installed but I am not clear on how to go about figuring this out. Any other ideas?