Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am getting this response:

Unable to recognise encoding of this document at /usr/lib/perl5/site_perl/5.8.8/XML/SAX/PurePerl/EncodingDetect.pm line 100.

using this code:
use Data::Dumper; use XML::Simple; my $xml = XMLin($_data);
Does anyone know what may becausing that?

Thanks,
Richard

Replies are listed 'Best First'.
Re: error returned with XML::Simple or Data::Dumper
by roboticus (Chancellor) on Jul 17, 2010 at 18:04 UTC

    Perhaps since $_data is undefined, the encoding detection software isn't happy with it? Why not try setting $_data to some XML and see what it does.

    ...roboticus

      Sorry, this is in a sub routine, and $_data is passed to the subroutine... so it is not undefined...

      I have it write to a debug file and here is an example of it, I only changed the private data...

      Data Received: " <?xml version = "1.0"?> <response> <status>success</status> <cardnumber>4141414141414141</cardnumber> <balance>1872.39</balance> </response> " XML Parser Parsed it into: $VAR1 = { 'balance' => '1872.39', 'cardnumber' => '4141414141414141', 'status' => 'success' };
      So, I know it is getting data passed to it.

      Richard

        Your $_data contains data in a character encoding that XML::SAX::PurePerl knows nothing about. Where are you getting $_data from? Is it UTF-8? ASCII? Some other charset? Is the charset being mangled about along the line? I couldn't see how to specifically tell XMLin which charset to use.

        -derby
Re: error returned with XML::Simple or Data::Dumper
by grantm (Parson) on Jul 19, 2010 at 01:14 UTC

    That message is a warning (rather than an error) which comes from the XML::SAX::PurePerl parser module. The EncodingDetect.pm file contains a routine to guess what encoding the source document uses.

    The routine will only be invoked if your source document does not start with an XML declaration that declares the encoding. So if you get an encoding declaration added to the document when it is generated then the warning will go away.

    The encoding detection routine has very simple logic. It first looks at the first few bytes of the file to see if it starts with a 'Byte Order Mark' (BOM). If a BOM is present, the encoding will be detected automatically.

    If there is no BOM but the first four bytes are ASCII "<?xm" then UTF-8 encoding is assumed.

    If the first non-whitespace byte is ASCII "<" then UTF-8 encoding is assumed.

    Finally a check is done to see if the bytes look like EBCDIC.

    If all these checks fail (as is happening in your case) and the warning is emitted, UTF-8 encoding is assumed and parsing will continue. However it seems very unlikely you have a valid XML document if none of those checks were successful. The most likely scenario is that the input XML is either undefined or an empty string. I recommend you go back and throw in a 'print' to confirm you really do have some XML.