DanielSpaniel has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have an XML file from a third party which I need to process, which contains product data. Most of the file I can process without any problem, but I am having a problem trying to extract attribute values from one tag.

The XML looks similar to this pseudo-extract:

<products> <product> <description>Product description.</description> <image>some url</image> <categories> <category catname="Books" id="xyz" name="Books"/> </categories> <prodname>Product name</prodname> </product> <product> ... </product> </products>
The relevant snippet of my Perl code looks like this:
my $file ='products.xml'; my $parser = new XML::Parser(Style=>'Stream'); eval {$parser->parsefile($file);}; sub StartTag { my ($expat, $tag, %attrs) = @_; if(%attrs) { print "Attributes:\n"; while(my($key, $value)=each(%attrs)) { print "\t$key => $value\n"; } }; if ($expat->within_element('products')) { push (@tagstack,lc($tag)); if ($tag eq'category') { # do whatever with the attributes ... } } }1;

However, despite playing around with this for ages, I cannot seem to get to retrieve the category attributes.

The code which checks if %attrs is empty or not never produces anything, and I don't understand where I'm going wrong.

Every product definitely has attributes in the "<categories><category catname="xyz" id="xxx" name="qqq"/></categories>" tags.

Any assistance would be much appreciated!

Replies are listed 'Best First'.
Re: XML::Parser - Obtaining Attributes
by toolic (Bishop) on Jan 25, 2014 at 00:16 UTC
    However, despite playing around with this for ages, I cannot seem to
    I once felt the same way about XML::Parser, which is why I switched to XML::Twig:
    use warnings; use strict; use XML::Twig qw(); use Data::Dumper qw(Dumper); $Data::Dumper::Sortkeys = 1; my $xml = <<XML; <products> <product> <description>Product description.</description> <image>some url</image> <categories> <category catname="Books" id="xyz" name="Books"/> </categories> <prodname>Product name</prodname> </product> </products> XML my $twig = XML::Twig->new( twig_handlers => { category => \&category }, ); $twig->parse($xml); sub category { my ($t, $cat) = @_; print Dumper($cat->atts()); } __END__ $VAR1 = { 'catname' => 'Books', 'id' => 'xyz', 'name' => 'Books' };

      Thanks for the suggestion. Appreciated.

      I'd prefer to make it work with XML::Parser if I can, but I'll certainly try your suggestion if I can't make any progress.

      Just curious, never having used twig, does one need to define a twig_handler for every tag one wishes to process? ... i.e. I need a bunch of information from most, but not all, elements, as well as the category attributes.

      Thx!
Re: XML::Parser - Obtaining Attributes
by runrig (Abbot) on Jan 25, 2014 at 00:20 UTC
    I, too, would use something other than XML::Parser, (e.g. XML::Rules), but to answer your question, you need to tell the parser what your handlers are:
    my $parser = new XML::Parser( Handlers => { Start => \&StartTag, }, );
      Thanks for the comment.

      Actually it defaults to the sub StartTag anyway, so no worries there, and the script has no problems processing the rest of the file; it really is just the attributes bit that I have an issue with.

      If I can't meet with any success then I guess maybe I'll have to use another module, but I'd really rather find out why this particular bit won't work.

        Actually it defaults to the sub StartTag anyway,

        I see that now (when you call with Style => 'Stream'), but if you read more closely, you'll see that with the Stream style, it does not call the sub with the arguments in @_. The attributes, e.g., are in %_.

Re: XML::Parser - Obtaining Attributes ( ddumper \@_ )
by Anonymous Monk on Jan 25, 2014 at 00:55 UTC

    The code which checks if %attrs is empty or not never produces anything, and I don't understand where I'm going wrong.

    You forgot to Data::Dump::dd(\@_)umper to see what you have ...

    I assume whatever you have, the documentation explains where to get the rest ( %_ )

    mwahahaha

    :) Anyway, XML::Parser is very low level, those who aren't smart enough to figure it out on their own need to suffer :)

    :) The Twig and the Rules are where you should go :)