I'd agree that XML::Twig is purpose-built for reading in "twigging the data" and writing back out again.

There is a monk here somewhere whose signature is something like, "Don't write your own XML parser." But of all the scores of languages I have ever seen, I have found XML to be absolutely the easiest to write a parser for. As far as your needs are concerned, the Backus Naur Form (BNF) is the shortest I can imagine for a language. Something like this would about cover it:

Document :== Heading [Tag ... ] Heading :== "<" [!">" ...] ">" Tag :== "<" { TagName [Assignment ...]} ">" {Value|Substructure} "</" +Tagname ">" Value :== [!"</" ...] Substructure :== [Tag...] Assignment :== name "=" QuotedString
guaranteeing that the parser (which does nothing other than express the BNF in code form) should be as trivial as it gets.

Of course, you also need a lexical analyser - about half a page in Perl and a thrower to walk past whitespace and carriiage returns, which can also poll the lexer rather than be written from scratch. You would also need to choose a structure that differentiates between simple value tags and tags that contain a substructure (e.g. tagname => { VALUE => scalar } versus tagname => { SUBTAGS => arrayReference }).

The code generation is a mirror of the parser, reading through your datastructure and generating the appropriate XML - thus equally trivial. You need to track the recursion depth of the puttag routine and just multiply $tabsize*($depth - 1) X " " to indent, putting each tag on its own line.

But the case for writing your own parser actually depends on whether or not you have a continuing need to meet new requirements that you cannot predict in advance (in my case there are multiple streams of XML to and from different organisations that have to be addressed for a single system) and cannot therefore nail your colours to any particular module that might already be available.

-M

Free your mind


In reply to Re: XML gurus unite!! by Moron
in thread XML gurus unite!! by jmmistrot

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.