bangor has asked for the wisdom of the Perl Monks concerning the following question:

I have a set of XML documents and I need to find out their structure before processing - they don't all have the same structure and there is no DTD. An example:
<root> <name>Ash</name> <latin_name>Fraxinus Excelsior</latin_name> <uses> <use> <description>Furniture making</description> </use> <use> <description>Firewood</description> </use> </uses> </root>
What I would like to achieve is a simple text representation something like:
root name latin_name uses use description
Can anyone point me to a module that could help me with this? Thanks.

Replies are listed 'Best First'.
Re: Determine the structure of an XML document
by kcott (Archbishop) on May 13, 2014 at 03:04 UTC

    G'day bangor,

    While I'm not a huge fan of XML::Simple for general XML work, it may suit your needs with minimal coding such as this:

    use Data::Dump; use XML::Simple; dd XMLin('document.xml');

    Here's an example of the output using the sample XML you provided.

    { latin_name => "Fraxinus Excelsior", name => "Ash", uses => { use => [ { description => "Furniture making" }, { description => "Firewood" }, ], }, }

    That seems fairly close to the "simple text representation' you're after. See the XML::Simple documentation for how to tweak that output.

    Just for completeness, I generated that output using a here-doc with the XML you posted. The code's in the spoiler for those interested.

    #!/usr/bin/env perl use strict; use warnings; use Data::Dump; use XML::Simple; dd XMLin(<<EOX); <root> <name>Ash</name> <latin_name>Fraxinus Excelsior</latin_name> <uses> <use> <description>Furniture making</description> </use> <use> <description>Firewood</description> </use> </uses> </root> EOX

    -- Ken

      Thanks Ken, that output structure will definitely help me with the processing stage.
Re: Determine the structure of an XML document
by Anonymous Monk on May 13, 2014 at 02:46 UTC
      Thanks Anonymous Monk, XML::LibXML::PrettyPrint is exactly what I was looking for.