Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

Again a problem with XML::Simple. I have a quite simple data structure

<?xml version="1.0"?> <DB> <termEntry> <ShortInfo>Ciao</ShortInfo> <langSet xml:lang="Italian"> <termGrp> <term>Ricerca scientifica</term> </termGrp> </langSet> <langSet xml:lang="English"> <termGrp> <term>research</term> </termGrp> </langSet> </termEntry> <termEntry> <ShortInfo></ShortInfo> <langSet xml:lang="Italian"> <termGrp> <term>università</term> </termGrp> </langSet> <langSet xml:lang="English"> <termGrp> <term>university</term> </termGrp> </langSet> </termEntry> </DB>

For each <termEntry>I need to read the values in <term> and the corrisponding attribute saved in <langSet> for further processing, i.e.

Italian: Ricerca scientifica English: research

I am failing... This is what I have so far. Any help would be appreciated

use strict; use warnings; use XML::Simple; use Data::Dumper; my $xml = XML::Simple->new(SuppressEmpty => q()); my $data = $xml->XMLin('myxml.xml'); for my $entry ( @{ $data->{termEntry} } ) { print Dumper( $entry ); print my $Term = $entry->{langSet}->{termGrp}->{term}; }

I have no idea how to read the attribute, and I have even problems with the basic variable $Term which seems to me straightforward to read with the above code. Ok: I am very new to XML parsing and this is for me just a 1 time task. But I have already lost 1 full day...

Replies are listed 'Best First'.
Re: Parsing XML::Simple
by kevbot (Vicar) on May 02, 2017 at 03:44 UTC

    Hello Anonymous Monk,

    The documentation for XML::Simple states that it should not be used in new code (see the STATUS OF THIS MODULE section). The author recommends alternatives such as XML::LibXML or XML::Twig. Perl XML::LibXML by Example looks like a good resource for getting started with XML::LibXML. Using information that I found there, I put together this example:

    #!/usr/bin/env perl use strict; use warnings; use v5.10; use XML::LibXML; my $file = 'myxml.xml'; my $dom = XML::LibXML->load_xml(location => $file); foreach my $lang_set ($dom->findnodes('/DB/termEntry/langSet')) { my $lang = $lang_set->getAttribute('xml:lang'); foreach my $term_grp ($lang_set->findnodes('./termGrp')){ say '$lang : '. $term_grp->findvalue('./term'); } } exit;
    I hope you find this helpful.

      Thank you!

      After trying for hours to adapt the other proposals to a slighty different XML input without success I started with XML::LibXML and this example... and I can say it was much much easier to adapt! Nice module, thank you!

Re: Parsing XML::Simple
by Discipulus (Canon) on May 02, 2017 at 08:22 UTC
    ..or using XML::Twig

    i need to add encoding="ISO-8859-1" to the first line of XML data

    I always make confusion with XML terminology and traversing, but this quick hack works:

    use strict; use warnings; use XML::Twig; my $twig=XML::Twig->new( twig_handlers => { 'termEntry/langSet'=>sub{print $_[1]->att('xml:lang'), ": ", $_[1]->first_child('termGrp')->te +xt, "\n" } }, ); $twig->parse( \*DATA ); __DATA__ <?xml version="1.0" encoding="ISO-8859-1"?> <DB> <termEntry> <ShortInfo>Ciao</ShortInfo> <langSet xml:lang="Italian"> <termGrp> <term>Ricerca scientifica</term> </termGrp> </langSet> <langSet xml:lang="English"> <termGrp> <term>research</term> </termGrp> </langSet> </termEntry> <termEntry> <ShortInfo></ShortInfo> <langSet xml:lang="Italian"> <termGrp> <term>università</term> </termGrp> </langSet> <langSet xml:lang="English"> <termGrp> <term>university</term> </termGrp> </langSet> </termEntry> </DB>

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Parsing XML::Simple
by Anonymous Monk on May 01, 2017 at 21:04 UTC
    langSet is an array of things:
    for my $entry ( @{ $data->{termEntry} } ) { for (@{ $entry->{langSet} }) { printf "term: %s\n", $_->{termGrp}{term}; } }

      Ah, I had not thought about it. I was now playing with the option "forcearray => 1" to change the data structure, but your solution works perfectly. Thanks.