in reply to Re: Re: XML::Checker::Parser
in thread XML::Checker::Parser

XML::LibXML is actually a better choice than XML::Parser::Checker: it is faster, better maintained and SAX compliant. It also has an HTML parser, which might help you if the malformed XML you receive happens to be some sort of HTML.

In general though, you are going down a dangerous path. There is a reason why the XML spec requires that a conforming XML processor must "not continue normal processing" once it detects a fatal error (Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way)), see 1.2 Terminology in the annotated XML spec, Tim Bray's comment about it is also instructive.

By accepting non-conformant XML in the system you will create all sorts of problems down the line, most of which being impossible to fix programatically. I know it is not always easy to tell customers, or other departments of your company, that you can't accept what they send you, but the XML spec is there to back you up, and get them (and you!) to do The Right Thing (tm).

If you really have to accept non-comformant XML, you should not expect an XML parser to deal with it (they won't!). Try to code a pre-processing step, which won't rely on XML tools, to convert the data to well-formed XML. From there you can then use XML tools to convert it to valid (ie conformant to your DTD) XML. Check the data after this pre-processing and build the rest of your process with XML tools. Writing the pre-processing step will be Hell but it will at least isolate the, pardon my French, crap they send you from your XML process.

Good luck!