in reply to Seeking for advice: XML parsing with special requirements [Solved]

So far, I have tried four different parsers (XML::SAX, XML::LibXML::SAX, XML::Parser, XML::LibXML::Reader) and read a lot about other parsers I possibly could use, but all failed or seem inappropriate in one respect or another.

AFAIK, XML::Parser fits your requirements for sure

  • Comment on Re: Seeking for advice: XML parsing with special requirements

Replies are listed 'Best First'.
Re^2: Seeking for advice: XML parsing with special requirements
by Nocturnus (Scribe) on Apr 22, 2012 at 14:04 UTC

    Thank you very much for bothering!

    I had some problems with XML::Parser:

    If it sees unresolvable entities (which I admit is formally an error in the XML document), it calls the default handler regardless of what handlers you have installed. This makes things more difficult, but I could live with it (I already had changed my code accordingly).

    The disqualifier is: In a handler, you get the original (unparsed) string by invoking the underlying expat instance via

    $_[0] -> original_string

    or

    $_[0] -> recognized_string

    That would be nice and easy in the first place, but in some cases, there is only rubbish in the respective string; this is true for nearly all of the declaration blocks (for example doctype declarations and attribute declarations). The expat documentation is explicitly confirming this observation; unfortunately, it's a thing I can't live with.

    As far as I know, XML::Parser always is based on expat, but perhaps, I have misunderstood something. If the latter is the case, I would be grateful if somebody could show me how to use XML::Parser with another underlying parser.

    Thank you very much,

    Nocturnus