Mr.Mick has asked for the wisdom of the Perl Monks concerning the following question:

I am failing terribly to return a Hash of the Parsed XML document using twig - in order to use it in OTHER subs for performing several validation checks. The goal is to do abstraction and create re-usable blocks of code.

XML Block

<?xml version="1.0" encoding="utf-8"?> <Accounts locale="en_US"> <Account> <Id>abcd</Id> <OwnerLastName>asd</OwnerLastName> <OwnerFirstName>zxc</OwnerFirstName> <Locked>false</Locked> <Database>mail</Database> <Customer>mail</Customer> <CreationDate year="2011" month="8" month-name="fevrier" day-of-mo +nth="19" hour-of-day="15" minute="23" day-name="dimanche"/> <LastLoginDate year="2015" month="04" month-name="avril" day-of-mo +nth="22" hour-of-day="11" minute="13" day-name="macredi"/> <LoginsCount>10405</LoginsCount> <Locale>nl</Locale> <Country>NL</Country> <SubscriptionType>free</SubscriptionType> <ActiveSubscriptionType>free</ActiveSubscriptionType> <SubscriptionExpiration year="1980" month="1" month-name="janvier" + day-of-month="1" hour-of-day="0" minute="0" day-name="jeudi"/> <SubscriptionMonthlyFee>0</SubscriptionMonthlyFee> <PaymentMode>Undefined</PaymentMode> <Provision>0</Provision> <InternalMail>asdf@asdf.com</InternalMail> <ExternalMail>fdsa@zxczxc.com</ExternalMail> <GroupMemberships> <Group>werkgroep X.Y.Z.</Group> </GroupMemberships> <SynchroCount>6</SynchroCount> <LastSynchroDate year="2003" month="12" month-name="decembre" day- +of-month="5" hour-of-day="12" minute="48" day-name="mardi"/> <HasActiveSync>false</HasActiveSync> <Company/> </Account> <Account> <Id>mnbv</Id> <OwnerLastName>cvbb</OwnerLastName> <OwnerFirstName>bvcc</OwnerFirstName> <Locked>true</Locked> <Database>mail</Database> <Customer>mail</Customer> <CreationDate year="2012" month="10" month-name="octobre" day-of-m +onth="10" hour-of-day="10" minute="18" day-name="jeudi"/> <LastLoginDate/> <LoginsCount>0</LoginsCount> <Locale>fr</Locale> <Country>BE</Country> <SubscriptionType>free</SubscriptionType> <ActiveSubscriptionType>free</ActiveSubscriptionType> <SubscriptionExpiration year="1970" month="1" month-name="janvier" + day-of-month="1" hour-of-day="1" minute="0" day-name="jeudi"/> <SubscriptionMonthlyFee>0</SubscriptionMonthlyFee> <PaymentMode>Undefined</PaymentMode> <Provision>0</Provision> <InternalMail/> <ExternalMail>qweqwe@qwe.com</ExternalMail> <GroupMemberships/> <SynchroCount>0</SynchroCount> <LastSynchroDate year="1970" month="1" month-name="janvier" day-of +-month="1" hour-of-day="1" minute="0" day-name="jeudi"/> <HasActiveSync>false</HasActiveSync> <Company/> </Account> </Accounts>

Perl Block

my $file = shift || (print "NOTE: \tYou didn't provide the name of the + file to be checked.\n" and exit); my $twig = XML::Twig -> new ( twig_roots => { 'Account' => \& parsing +} ); #'twig_roots' mode builds only the required sub-trees from the d +ocument while ignoring everything outside that twig. $twig -> parsefile ($file); sub parsing { my ( $twig, $accounts ) = @_; my %hash = @_; my $ref = \%hash; #because was getting an error of Odd number of h +ash elements return $ref; $twig -> purge; }

It gives a hash reference - which I'm unable to deference properly (even after doing thousands of attempts). Again - just need a single clean function (sub) for doing the Parsing and returning the hash of all elements ('Accounts' in this case) - to be used in other other function (valid_sub) for performing the validation checks. I'm literally stuck at this point - and will HIGHLY appreciate your HELP.

POSTED at StackOverflow as well - for the sole reason of having multiple answers in order to get the desired result.

Replies are listed 'Best First'.
Re: Need an hash of the Parsed Document !
by choroba (Cardinal) on Dec 17, 2015 at 16:36 UTC
    Crossposted at StackOverflow. It's considered polite to inform about crossposting so people not visiting both sites don't waste their time hacking a problem already solved at the other end of the internet.
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      I've just edited and mentioned that in my post as well.

Re: Need an hash of the Parsed Document !
by poj (Abbot) on Dec 17, 2015 at 17:05 UTC

    Not sure if this is what you want

    #!perl use strict; use XML::Twig; my $twig = XML::Twig->new (); my $hr = $twig->parse(\*DATA)->simplify; for (@{$hr->{'Account'}}){ print $_->{'Id'}."\n"; print $_->{'OwnerLastName'}."\n"; print $_->{'OwnerFirstName'}."\n\n"; } __DATA__ <?xml version="1.0" encoding="utf-8"?> . . .
    poj

      Thank You for your attempt to help 'poj'. But, I don't want to print them. Instead I want to store and return them as a hash - which needs to be used by another function (sub) for doing the validation.

        Your hash is %$hr - see simplify method in XML::Twig

        Simplify - Return a data structure suspiciously similar to XML::Simple's. Options are identical to XMLin options, see XML::Simple doc for more details (or use DATA::dumper or YAML to dump the data structure)

        poj

Re: Need an hash of the Parsed Document !
by tangent (Parson) on Dec 17, 2015 at 19:45 UTC
    If you are only after the text values of certain tags you could do something like this:
    sub parsing { my ( $twig, $account ) = @_; my @tags = qw( Id OwnerFirstName OwnerLastName ); my %hash = map { $_ => $account->findvalue($_) } @tags; validate( \%hash ); } sub validate { my ( $hash ) = @_; if ( $hash->{'Id'} eq 'abcd' ) { print "Id $hash->{'Id'} is valid\n"; } else { print "Id $hash->{'Id'} is not valid\n"; } }
    If you need to get attributes as well, like the values in 'CreationDate' then you will have to do a bit more work:
    my @cdates = $account->findnodes('CreationDate'); my $cdate = $cdates[0]; $hash{'date'} = join('-', $cdate->att('day-of-month'), $cdate->att('mo +nth'), $cdate->att('year') );
Re: Need an hash of the Parsed Document !
by Preceptor (Deacon) on Dec 18, 2015 at 16:17 UTC

    There is a fundamental problem with what you are trying to do. XML doesn't fit cleaning in native perl data structures. A hash is unordered key-values - but XML is ordered. An array is ordered scalars. And an XML structure _may_ have both children nodes, and attributes as key-value pairs.

    It is simply impossible to do this without losing data in the process which is why "XML::Twig" doesn't do it already - it instead uses "XML::Twig::Elt" elements which are objects - including data and accessor methods to handle this situation correctly.

    You can pass this object around to other bits of code too, and do the validation correctly - or you can try and do some sort of partial translation, but this is inherently a BAD IDEA.

    You can reduce XML to a simpler format, but there's no 'magic bullet' approach, any more than how you can convert a banana to a combined image/smell/taste document on your computer automatically (or without an awful lot of effort). At best you can represent the various elements, or extract the things you are interested in.

    But XML::Twig _already_ has a mechanism to do this - you fire a twig handler, and you do the manipulation of the thing you're interested in. (Or do it via walking the "twig" tree by hand)>

    That's why this is considered a better approach than trying to 'down convert' it - but if you really must, XML::Twig has a simplify method, that will do exactly what you want. (Try it, and you'll see why it's nasty):

    #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; use XML::Twig; sub parsing { my ( $twig, $accounts ) = @_; my $horrible_hacky_hashref = $accounts -> simplify ( forcearray => + 1, keyattr => [] ); print Dumper \$horrible_hacky_hashref; } my $twig = XML::Twig -> new ( twig_roots => { 'Account' => \&parsing } + ); #'twig_roots' mode builds only the required sub-trees from the do +cument while ignoring everything outside that twig. $twig -> parsefile ($file);

    This will generate you a structure like:

    $VAR1 = \{ 'PaymentMode' => [ 'Undefined' ], 'ExternalMail' => [ 'fdsa@zxczxc.com' ], 'SynchroCount' => [ '6' ], 'Provision' => [ '0' ], ### .... etc ....

    Alternatively, you can walk the structure manually, using recursion. But either way, you're just far better off _not_ trying to convert XML and just apply the validation criteria directly.