As others have said, the complexity of this xml structure doesn't really support a direct conversion to CSV. Indeed, I'd guess there are a variety of ways in which CSV output might be constructed from this xml data.

Apart from that, it appears that a lot of the xml structure is involved with typesetting/formatting of the content: tags like "OrderedList", "ItemizedList", "ListItem", "Style", "Emphasis", "Strong", etc, indicate stuff that is intended for CSS handling, rather than database construction. Some of these tags are only vaguely "structural" (in the sense of describing the logical/semantic organization of the data), while others may be purely "cosmetic". I don't see any obvious way to transform this xml into a coherent csv.

So you need a more informed specification of your goal: what exactly should the csv file contain (i.e. how many fields are needed, and what are their names)? And with that in mind, how can the csv rows and columns be filled in, based on the clues available in the xml data? Some manual analysis of the data will be needed in order to write the code.

Actually, it's possible (likely?) that a proper solution will involve two or more relational tables rather than just one table. And while you're at it, you'll need to worry about making sure your csv output is well formed: quote fields where necessary, escape quotes and apostrophes within fields as needed, and watch out for newlines embedded within field values (maybe normalize these to spaces).


In reply to Re: XML to CSV by graff
in thread XML to CSV by Blue_eyed_son

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.