Hi

I'm trying to parse XML from various XML services of the monastery and got bitten by entities in attributes:

<info site="http://perlmonks.org/" sitename="PerlMonks" ticker_id="507 +310" gentimeGMT="2015-02-14 17:40:04" xmlstyle="clean" xmlmaker="XML: +:Fling 1.001" fetch_num="109" delta_since="2015-02-14 17:29:58&amp;nb +sp;GMT" min_poll_seconds="600">Rendered by the Noderep XML Ticker</in +fo>

while entities in attributes seem to be allowed this format of delta_since="2015-02-14 17:29:58&amp;nbsp;GMT" looks broken¹.

(compare also gentimeGMT="2015-02-14 17:40:04")

Could someone enlighten me if and why that's necessary?

Otherwise should it better be changed to delta_since="2015-02-14 17:29:58 GMT"?

Or are there date-time parsers requiring these entities?

Cheers Rolf

PS: Je suis Charlie!

¹) (or do I miss something in the XML specification?)

Replies are listed 'Best First'.
Re: Date format in attributes of local XML API
by N-Wing (Deacon) on Feb 15, 2015 at 06:41 UTC

    Your example is from noderep xml ticker, which appears to be getting the double-escape of the space as &amp;nbsp; from calling parseTimeInString (htmlcode). I'm guessing that the entity was inserted accidentally (see patch diff), so it is probably safe to remove the two &nbsp; completely in that htmlcode (lines 46 and 49).

    P.S.: In case there was a legitimate reason for that patch, I didn't create my own since there isn't enough room in the patch reason to explain this.

    --== [N] ==--

      parseTimeInString is returning HTML. The &nbsp; is useful in some cases. The XML tickers should s/&nbsp;/ /g or parseTimeInString should get another argument specifying that HTML isn't wanted.

      - tye        

        The &nbsp; is useful in some cases.

        It is used inconsistently, though. Both times &nbsp; is used ('%b %d, %Y at %H:%M&nbsp;%Z' and /%Y-%m-%d %H:%M:%S&nbsp;%Z/), the other spaces are just normal spaces. I would say just replace the &nbsp; with normal spaces, but I don't know if the non-breaking space was intentionally included to keep the time zone information with the time.

        --== [N] ==--