Hi monks, I'm using XML::Twig to parse an XML document, with dates and events, that is not as nicely structured as I'd like. Here's a pseudo-code example:

<div id="calendar"> <h3 class="current-day">Wednesday, February 1</h3> <div class="event">Event 1</div> <div class="event">Event 2</div> <div class="event">Event 3</div> <h3 class="current-day">Thursday, February 2</h3> <div class="event">Event 1</div> <div class="event">Event 2</div> <h3 class="current-day">Friday, February 3</h3> <div class="event">Event 1</div> <div class="event">Event 2</div> </div>

The problem is the div-events are not contained within the h3 elements, so I can't figure out how to associate the events with each date. I can get all the h3 children and all the div-event children with a div event handler at the top level:

my($twig, $div) = @_; if($div->att('id') eq 'calendar'){ my(@dates) = $div->children('h3'); my(@events) = $div->children('div'); }

But obviously that just gives me two unconnected lists. Is there some clever way I can associate these elements, perhaps in the order they appear? Doesn't seem to be a "next_child" function in XML::Twig.

Thanks for any advice.


In reply to XML::Twig parsing poorly structured content by slugger415

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.