in reply to Perl has plenty of XML parsers, but is there an XML printer?

What do you mean "XML printer"? There are a slew of modules that generate XML including XML::Twig, XML::Writer, XML::ValidWriter, XML::AutoWriter, XML::Spew, XML::Simple, ...


DWIM is Perl's answer to Gödel
  • Comment on Re: Perl has plenty of XML parsers, but is there an XML printer?

Replies are listed 'Best First'.
Re^2: Perl has plenty of XML parsers, but is there an XML printer?
by yaneurabeya (Novice) on Jul 19, 2007 at 23:21 UTC

    XML printer => a module that prints out XML compatible spec strings to file descriptors :).

    Anyhow, I'll look into XML::*Writer. I looked at XML::Twig, but I'm not sure it's what I'm aiming for (was targeted for partial XML file parsing from what the author wrote).

    My work's Perl dist also doesn't have a lot of the more esoteric XML Perl modules, and because I'm trying to conform to cross site Perl distributions, I'm avoiding the less popular XML solutions.

    Could a list of (the more popular) modules that do both parsing and printing be provided?

    Thanks!

      XML::Twig is the "kitchen sink" XML manipulating module. It does pretty much everything you would want to do in the context of XML and is widely used.

      You may like to post a very small sample of what you need to parse and what your output should look like.


      DWIM is Perl's answer to Gödel
        <?xml ver="blah"?> <sections> <build> <field name="build time"></field> <!-- This a non-delimited section of text that will be parsed . +.. --> <fields type="section of text 1"> <!-- A token-delimited section of text that will be parsed into + this area... --> </fields> </build> <run> <errors> <error time="some_integer" some_attr="more helpful info ab +out error gets put here" /> <!-- ... etc --> </errors> <mismatches> <mismatch time="another_integer" some_attr="more helpful i +nfo about mismatch goes here" /> <!-- ... etc --> </mismatches> <perf_stats> <stat type="performance item name">value</stat> <!-- stats about speed, test time, etc go here ... --> </perf_stats> </run> </sections>

        Extra notes (about flat logs): - Flat logs which are parsed are 400~100k lines large. - Current system uses sections of text vs a 'gold log' (good existing known output), and does either specialized subset field checking or straight diff(1)'ing. This unfortunately is a bad idea with a large number of tests because the number of logs is ( 1000 tests * (1-2 logs) * (1-3 test sets) ) => 1000 ~ 6000 logs. So, if the file format changes (i.e. a new feature is introduced to the toolchain) so will all affected logs, and bringing the 'gold logs' up to date will consume a lot of unnecessary time, and this is going to occur in the future.

        Thus, by moving to a more structured data store, I can get away from the flat file's formats and get to a content based comparison system.