in reply to XML::Simple cannot parse Simple XML file

This node falls below the community's minimum standard of quality and will not be displayed.
  • Comment on Re: XML::Simple cannot parse Simple XML file

Replies are listed 'Best First'.
Re^2: XML::Simple cannot parse Simple XML file
by davorg (Chancellor) on Jun 13, 2006 at 09:07 UTC

    The problem is that XML::Simple is supposed to parse XML (I believe that it uses XML::Parser to do that). The rules for what constitutes valid XML are very strict. The rules also say that when an XML parser encounters a document that isn't valid XML then it shouldn't try to recover, it should just die with an appropriate error message.

    What you are trying to parse isn't XML. Therefore trying to parse it with an XML parser isn't going to work.

    You really have two options. You can fix your document so that it is valid XML which will allow you to use all the XML tools to process it. Or you can stay with your current "almost-XML" and write your own set of tools to process it.

    I know which way I'd go :-)

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      And just in case you have to process XML that you don't produce and is invalid, I recommend XML::Liberal. For an example, Plagger uses it to parse broken feeds.
        And just in case you have to process XML that you don't produce and is invalid

        Terminology is important here. There is no such thing as "invalid XML". If your data doesn't follow the XML specs then it is not XML.

        When presented with data that is supposed to be XML but isn't, the best action is to insist that it be replaced by something that is XML. Remember how the web was taken over by invalid HTML because browsers were too lenient? Let's not allow XML to go the same way.

        --
        <http://dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

        I'm not convinced that the name of the module is very sensible. If it can parse 'invalid XML' without an error then it isn't a compliant XML Processor, the specification is quite clear about this:

        Validating and non-validating processors alike MUST report violations of this specification's well-formedness constraints in the content of the document entity and any other parsed entities that they read.
        The emphasis is from the specification. Obviously it's a bit late now but the module would be better to be called something like 'Text::XMLish' or 'Text::Tagged::AngleBrackets' to make it clear. There is no such thing as a 'Liberal' XML parser in terms of the well-formedness constraints.

        /J\

Re^2: XML::Simple cannot parse Simple XML file
by grinder (Bishop) on Jun 13, 2006 at 10:03 UTC
    if somebody knows how to make XML::Simple more lenient it would help me a lot

    No it wouldn't. You do not understand what you are asking for. As soon as you tried parsing with a different parser, you would be back to square one. Do you really want to take on the world, one XML parser at a time?

    XML is really quite a simple format. It's also very flexible. It's also designed to be used in all sorts of environments and contexts. To make sure it achieves this goal, you have to play by the rules. And these are that you can't just make stuff up and expect it to work.

    In theory, the idea behind XML was that writing a parser should be easy. It didn't quite pan out as well as everyone expected, but part of this quest for simplicity meant that the parser didn't have to attempt to deal with documents that try to bend the rules. So no-one is going to add extra stuff to their parser to deal with bogus documents, since the standard already states that they do not have to.

    XML, while it is simple, is also a royal pain to work with. It's bad enough as it is. If it turned out that we had several classes of parsers, some of which were able to parse document X and not document Y, some could parse both and some could parse neither, the situation would become well nigh impossible. Does "Well it looks ok with my parser!" sound familiar?

    • another intruder with the mooring in the heart of the Perl