in reply to Re: XML::Simple cannot parse Simple XML file
in thread XML::Simple cannot parse Simple XML file

The problem is that XML::Simple is supposed to parse XML (I believe that it uses XML::Parser to do that). The rules for what constitutes valid XML are very strict. The rules also say that when an XML parser encounters a document that isn't valid XML then it shouldn't try to recover, it should just die with an appropriate error message.

What you are trying to parse isn't XML. Therefore trying to parse it with an XML parser isn't going to work.

You really have two options. You can fix your document so that it is valid XML which will allow you to use all the XML tools to process it. Or you can stay with your current "almost-XML" and write your own set of tools to process it.

I know which way I'd go :-)

--
<http://dave.org.uk>

"The first rule of Perl club is you do not talk about Perl club."
-- Chip Salzenberg

  • Comment on Re^2: XML::Simple cannot parse Simple XML file

Replies are listed 'Best First'.
Re^3: XML::Simple cannot parse Simple XML file
by wolv (Pilgrim) on Jun 13, 2006 at 23:55 UTC
    And just in case you have to process XML that you don't produce and is invalid, I recommend XML::Liberal. For an example, Plagger uses it to parse broken feeds.
      And just in case you have to process XML that you don't produce and is invalid

      Terminology is important here. There is no such thing as "invalid XML". If your data doesn't follow the XML specs then it is not XML.

      When presented with data that is supposed to be XML but isn't, the best action is to insist that it be replaced by something that is XML. Remember how the web was taken over by invalid HTML because browsers were too lenient? Let's not allow XML to go the same way.

      --
      <http://dave.org.uk>

      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg

        There is no such thing as "invalid XML".

        So if you make a typo that's a syntax error in Perl, it's no longer Perl? Although I understand what you're saying, I don't agree with how you said it.

        "He's not a patient person." = "He's an impatient person." "This is not valid XML." = "This is invalid XML."

        It's like, say you define a "table" as "a thing for sitting stuff on that has four legs". Then you find another thing that you could sit stuff on, except one of the legs is missing. Wouldn't you say that it's a broken table rather than say that broken tables don't exist?

      I'm not convinced that the name of the module is very sensible. If it can parse 'invalid XML' without an error then it isn't a compliant XML Processor, the specification is quite clear about this:

      Validating and non-validating processors alike MUST report violations of this specification's well-formedness constraints in the content of the document entity and any other parsed entities that they read.
      The emphasis is from the specification. Obviously it's a bit late now but the module would be better to be called something like 'Text::XMLish' or 'Text::Tagged::AngleBrackets' to make it clear. There is no such thing as a 'Liberal' XML parser in terms of the well-formedness constraints.

      /J\