There is a fundamental problem with what you are trying to do. XML doesn't fit cleaning in native perl data structures. A hash is unordered key-values - but XML is ordered. An array is ordered scalars. And an XML structure _may_ have both children nodes, and attributes as key-value pairs.

It is simply impossible to do this without losing data in the process which is why "XML::Twig" doesn't do it already - it instead uses "XML::Twig::Elt" elements which are objects - including data and accessor methods to handle this situation correctly.

You can pass this object around to other bits of code too, and do the validation correctly - or you can try and do some sort of partial translation, but this is inherently a BAD IDEA.

You can reduce XML to a simpler format, but there's no 'magic bullet' approach, any more than how you can convert a banana to a combined image/smell/taste document on your computer automatically (or without an awful lot of effort). At best you can represent the various elements, or extract the things you are interested in.

But XML::Twig _already_ has a mechanism to do this - you fire a twig handler, and you do the manipulation of the thing you're interested in. (Or do it via walking the "twig" tree by hand)>

That's why this is considered a better approach than trying to 'down convert' it - but if you really must, XML::Twig has a simplify method, that will do exactly what you want. (Try it, and you'll see why it's nasty):

#!/usr/bin/env perl use strict; use warnings; use Data::Dumper; use XML::Twig; sub parsing { my ( $twig, $accounts ) = @_; my $horrible_hacky_hashref = $accounts -> simplify ( forcearray => + 1, keyattr => [] ); print Dumper \$horrible_hacky_hashref; } my $twig = XML::Twig -> new ( twig_roots => { 'Account' => \&parsing } + ); #'twig_roots' mode builds only the required sub-trees from the do +cument while ignoring everything outside that twig. $twig -> parsefile ($file);

This will generate you a structure like:

$VAR1 = \{ 'PaymentMode' => [ 'Undefined' ], 'ExternalMail' => [ 'fdsa@zxczxc.com' ], 'SynchroCount' => [ '6' ], 'Provision' => [ '0' ], ### .... etc ....

Alternatively, you can walk the structure manually, using recursion. But either way, you're just far better off _not_ trying to convert XML and just apply the validation criteria directly.


In reply to Re: Need an hash of the Parsed Document ! by Preceptor
in thread Need an hash of the Parsed Document ! by Mr.Mick

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.