I am getting a strange behavior with XML::Parser (v. 2.40) . This is on a Solaris 10 system. When parsing a file the last few days there have been a few odd cases where and attribute is parsed twice (or maybe split). Not sure how to describe. This only occurs once in a while.

The issue occurs in the "hdl_char" routine. This is a simple sub that just assigns values to a hash array where the hash elements are the attribute names for later processing.

Here is the output of printing each call to hdl_char with the attrib=value pairs. Note the "..." are added to check for blank padding. Note the first case of sessionStartDateTime where the actual value has been split which results in the last value "Z" being assigned to the hash. The second attribute sessionEndDateTime is correctly parse. I have check the source file and there are no spurious characters. I have listed part of the source record below also.

currattr:GMTSessionStartDateTime = ... currattr:GMTSessionStartDateTime = ... currattr:sessionStartDateTime = 2013-09-10T17:15:00.000... currattr:sessionStartDateTime = Z... currattr: = ... currattr: = ... currattr:timeZoneOffset = -240... currattr: = ... currattr: = ... currattr: = ... currattr: = ... currattr:GMTSessionEndDateTime = ... currattr:GMTSessionEndDateTime = ... currattr:sessionEndDateTime = 2013-09-10T17:30:00.000Z... currattr: = ... currattr: = ... currattr:timeZoneOffset = -240...
<ns0:GMTSessionStartDateTime> <ns0:sessionStartDateTime>2013-09-10T17:15:00.000Z</ns0:sessionStartDa +teTime> <ns0:timeZoneOffset>-240</ns0:timeZoneOffset> </ns0:GMTSessionStartDateTime> <ns0:GMTSessionEndDateTime> <ns0:sessionEndDateTime>2013-09-10T17:30:00.000Z</ns0:sessionEndDateTi +me> <ns0:timeZoneOffset>-240</ns0:timeZoneOffset> </ns0:GMTSessionEndDateTime>

This is an intermittent problem but is causing real issues for me at this point. Since this handler is called by the parser then I am making an assumption this is a parser issue. I am using Expat also.


In reply to XML::Parser error by mwinterer

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.