in reply to Reading a particular xml

Hmm, although I have been using these modules with success quite often, I am pis*ed off with the XML, HTML, JSON, CSV, etc. modules that are too stringent on the syntax and fail at the first syntax mistake.

I am happy when these modules produce perfect syntax-compliant output, but much less happy when they choke and fail at some mistake upon reading the input. Come on, why isn't it apparently possible to implement a DWIM principle in these formats?

Quite a few times, I have had to use my own poor regex solution because these modules are usually unable to cope with only very slightly deviant format.

I understand that there is no perfect solution to this problem, but there could be at least a "strict" and a "lenient" mode in these modules to cope with different situations. I may be wrong, but I do not know of any of these modules having such different modes.

Replies are listed 'Best First'.
Re^2: Reading a particular xml
by Your Mother (Archbishop) on Aug 14, 2015 at 22:49 UTC
    Come on, why isn't it apparently possible to implement a DWIM principle in these formats?

    This actually makes me angry. Well, not really but it's fun to pretend. The sloppy, half-assed XML that exists because of HTML and sloppy, half-assed programmers who ignored the standard, has caused many of us tons of woe. If you are being served broken JSON or XML, push back, push back hard. There is no excuse for anyone generating garbage here and the more lenient the tools are, the more the heap of trash will grow to block out the sun!!!

    Boo! But really. Don't accept broken XML or JSON.

      Yeah, you're probably right, Your Mother, on the deep side of things, it is probably not a very good idea to ask for more lenient tools. But I still wish I had them, sometimes I just can't reject bad data formatting from the client.
Re^2: Reading a particular xml
by bitingduck (Deacon) on Aug 15, 2015 at 02:17 UTC

    I don't do much JSON, but the requirement for XML is to die on errors. No guessing, don't be polite about it, just up and die. CSV is a little fuzzier, but I tend to think it's not a bad idea to die on bad CSV. When I try to read CSV with the settings a little bit wrong, I tend to get something so horrific that I prefer that the reader just die.

    HTML is a different story - there's enormous amounts of bad html out there and the convention unfortunately is that browsers will tolerate it. In a fair bit of screen-scraping, I haven't really run into problems with HTML parsers barfing, but I think I'm also mostly reading machine-generated HTML (though without guarantees that it's any good).

      the requirement for XML is to die on errors. No guessing, don't be polite about it, just up and die.

      I'm no big fan of XML, but I think this is the biggest strength of XML and related tools. Either your XML is well-formed, or your tools start to complain. Tools that generate tag soup instead of XML and name it XML are just broken and need to be fixed or must stop calling their output XML.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)