in reply to Re: XML::Simple giving a non-specific error
in thread XML::Simple giving a non-specific error

You seem to be saying that being unable to determine that an error occurred before having reached the end of file means it can't be reported accurately. That's not the case, as seen in the update to my post.

By the way, the error wasn't reported at the end of the file, it was reported when the closing tag of the parent element (</ROOT>) was found.

Replies are listed 'Best First'.
Re^3: XML::Simple giving a non-specific error
by almut (Canon) on Mar 12, 2010 at 00:41 UTC
    By the way, the error wasn't reported at the end of the file

    Judging by the byte position (377), it was (the closing angle bracket of </ROOT> is byte 375(*)).  I don't know why the line number is reported one less than it should be — maybe the <?xml ...?> header isn't being counted.

    As for your other point, I think you're right if the parser would keep track of all starting positions of so far unclosed tags.

    ___

    (*) assuming unix newlines, which I did after having seen i386-linux-thread-multi in the OP's error message.

      If a guy catches the baseball at the edge of the outfield, it's not the edge of the outfield that caught the ball. Aside from the fact that it really was found before EOF (since at least the last newline and the EOF remain unparsed), the point was that the error could have been caught earlier, and would have been caught earlier (say if you had <ROOT><BODY><ERROR><ERROR></BODY></ROOT>).

      Besides, the following indicates the reported byte pos for me:

      </ROOT> ^ |
Re^3: XML::Simple giving a non-specific error
by Anonymous Monk on Mar 12, 2010 at 00:23 UTC
    That's not the case, as seen in the update to my post.

    Your update shows how LibXML, a parser which builds a tree (takes more memory), can provide better error messages than a simpler parser like expat.

      You seem to be implying that the fact that it builds a tree is relevant (if XML::LibXML::SAX even builds a tree). It's not. To be able to provide the error message it already provides, the parser needs a list of unclosed elements.
      my @unclosed = ( 'ROOT', 'ERROR', 'ERROR', );
      All that's needed to provide a better error message is to note a line number along with the name of the element.
      my @unclosed = ( [ 'ROOT', 3 ], [ 'ERROR', 8 ], [ 'ERROR', 8 ], );

      Yes, it uses extra memory, but 1) it doesn't add to the magnitude (O()) of the memory used, 2) the maximum used is proportional to the depth of tree and they're usually quite shallow (20?).

      As for expat being simpler, its actually almost identical to SAX. It wouldn't surprise me if one inspired the other.