Something like this could certainly works, although it would be more complex as the document is actually more complex and and I have about 30 wrapping rules, so I would not be able to wait for the end of the parsing to output the officers and persons. But see how long your solution is? How much job it is for each rule, and you have to write another piece of code for each different rule, or at least each different type of pattern. And my real transformation table has rules such as:

  stdtitle => 'stddes*, stddesmo?, reaf?, stdcoll?, titlemod?, revision?, title+'

With a solution like yours I would have to simulate (baddly) the regexp engine, while with the code as it stands I just have to add one line to the %wrap table (and an item in the @wraparray) and... voila!. I get a good chunk of regexps for free

So your solution qualifies as "pure XML", but fails to be a generic one, while mine is not XML (and thus dangerous), but generic, and I am still searching for my Holy Graal of a generic XML solution (which should have been the title of my first post now that I think about it)...

Your code uses XML::Parser in a very clean way though, witing your own style and storing parser related data (the text, people and officers fields) with the parser. Neat!


In reply to Re: Re: Ugly XML processing looking for a pure XML solution by mirod
in thread Ugly XML processing looking for a pure XML solution by mirod

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.