Dear Monks,

I have a large XML document with more than 100.000 records (so I don't want to read it all at once). Each record contains, among other elements, a category and a title. I need to change the category element of the record depending on certain aspects of the title element.

I was thinking about using XML::SAX::ByRecord for this. The part I'm having a problem with is accessing the contents of the title element (which comes after the category in the XML) while processing the category element. How is this done within this paradigm?
My guess is once I see the category element I have to collect data until I come across the title element, modify the category and then somehow "release" all the stuff that I have collected. I'm not sure how this is done in a XML::SAX filter.

Best,
-Sven


In reply to Modifying Records with XML::SAX::ByRecord by Lorphos

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.