rubric has asked for the wisdom of the Perl Monks concerning the following question:

I am running the XML::Checker::Parser example code from it's doc page line for line (except that I add a few prints where it's missing some code), and I am using a known valid xml file. I get the error: Can't coerce array into hash at blah/Parser.pm line 187. What could this be? Here's the program:
use XML::Checker::Parser; my %expat_options = (KeepCDATA => 1, Handlers => [ Unparsed => \&my_Unparsed_handler +]); my $parser = new XML::Checker::Parser (%expat_options); eval { local $XML::Checker::FAIL = \&my_fail; $parser->parsefile ("valid.xml"); }; if ($@) { print "Either XML::Parser (expat) threw an exception or my_fail() +died."; } # Throws an exception (with die) when an error is encountered, this # will stop the parsing process. # Don't die if a warning or info message is encountered, just print a + message. sub my_fail { my $code = shift; die XML::Checker::error_string ($code, @_) if $code < 200; XML::Checker::print_error ($code, @_); } sub my_Unparsed_handler { print "!"; }

Replies are listed 'Best First'.
Re: XML::Checker::Parser
by mirod (Canon) on Nov 22, 2002 at 19:01 UTC

    That would be Handlers => [ Unparsed => \&my_Unparsed_handler ] should be Handlers => { Unparsed => \&my_Unparsed_handler }. Note the { ... } instead of [ ... ]: the Handlers parameter should be a hash ref, not an array ref (hence the error message when Perl tries to force an arry into a hash).

      Thanks. That does make a difference. That is an error in the example code of XML::Checker::Parser.

        You should send at least a bug report, and even better a doc patch to the maintainer then (that's T.J. Mather, the current revision is 0.13, not the first one that shows up on search.capn.org).

Re: XML::Checker::Parser
by vek (Prior) on Nov 22, 2002 at 18:57 UTC
    I haven't been on the Perl-XML mailing list in a while but I seem to recall XML::Checker::Parser to be somewhat buggy. Don't know whether that's still the case as I see it is at least being actively maintained again. I'm assuming you're using XML::Checker::Parser because you want to validate XML against a DTD. If you're open to another opinion I might suggest using XML::LibXML instead. I use it for my DTD validation and IMHO - it rocks.

    -- vek --
      I'll check out XML::LibXML. I'm actually just kicking the tires on different parsers right now. For the project I'm working on, I have a problem in that I can't rely on all the files to be valid XML. Sometimes people make mistakes on their XML files and my code just needs to be able to do it's best with what they provide. No crapping out allowed. I'm in a hard place because I know my own parser will never be up to snuff with the existing ones, and I can't seem to find a parser that will survive an XML file with a mistake.

        XML::LibXML is actually a better choice than XML::Parser::Checker: it is faster, better maintained and SAX compliant. It also has an HTML parser, which might help you if the malformed XML you receive happens to be some sort of HTML.

        In general though, you are going down a dangerous path. There is a reason why the XML spec requires that a conforming XML processor must "not continue normal processing" once it detects a fatal error (Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way)), see 1.2 Terminology in the annotated XML spec, Tim Bray's comment about it is also instructive.

        By accepting non-conformant XML in the system you will create all sorts of problems down the line, most of which being impossible to fix programatically. I know it is not always easy to tell customers, or other departments of your company, that you can't accept what they send you, but the XML spec is there to back you up, and get them (and you!) to do The Right Thing (tm).

        If you really have to accept non-comformant XML, you should not expect an XML parser to deal with it (they won't!). Try to code a pre-processing step, which won't rely on XML tools, to convert the data to well-formed XML. From there you can then use XML tools to convert it to valid (ie conformant to your DTD) XML. Check the data after this pre-processing and build the rest of your process with XML tools. Writing the pre-processing step will be Hell but it will at least isolate the, pardon my French, crap they send you from your XML process.

        Good luck!