Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

XML processing

by shreya (Novice)
on Aug 05, 2005 at 22:18 UTC ( #481380=perlquestion: print w/replies, xml ) Need Help??

shreya has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

After a lot of documentation reading and testing and spending almost 2 days trying to understand XML:Parser module I finally have to ask for some help here.

I receive some data in a XML file and am required to extract part of the data

<rootNode> <a> ...text + other XML </a> <a> ...text + other XML </a> </rootNode>

Now I need to extract all information between

<a> and </a>
tags and I need to do this using XML:Parser.
Any suggestions on how I could do this ?
below is my non-working code:
use XML::Parser; my $XmlFile = "foo.xml"; die "Can't find file \"$XmlFile\"" unless -f $XmlFile; my $pFoundFlag = 0; my $NewFile; my $parser = new XML::Parser(Style => 'Debug'); $parser->setHandlers( Start => \&startElement(), Char => \&char_handler(), End => \&end_Handler, ); $parser->parsefile($XmlFile); sub startElement { my ($parserInst, $element, %attr) = @_; if ($element eq "a") { $pFoundFlag = 1; } if ($pFoundFlag) { if(not $NewFile) { $NewFile = $element; } else { $NewFile .= $element; } } } sub char_handler { my ($parserInst, $data); if ($pFoundFlag) { $NewFile .= $data; } } sub end_Handler { my ($parserInst, $element); if ($pFoundFlag) { $NewFile .= $element; } if ($element eq "a") { $pFoundFlag = 0; } } $OpFile = "foo_outpur.xml"; open (OP, ">$OpFile") or die ("can't open output file!"); print OP $NewFile; close(OP);

Replies are listed 'Best First'.
Re: XML processing
by borisz (Canon) on Aug 05, 2005 at 22:55 UTC
    Do yourself a favor and use another Modul to Filter your XML. I suggest XML::Twig or XML::SAX. The tool xml_grep from XML::Twig does exactly what you want.
    xml_grep a your_xmlfile.xml
Re: XML processing
by Tanktalus (Canon) on Aug 06, 2005 at 02:44 UTC

    Seconding borisz's advice, here's some (untested) XML::Twig code to do this, in case you want to continue doing stuff with the results in perl:

    use XML::Twig; # ... my $twig = XML::Twig->new(); $twig->parsefile($XmlFile); my @a_elements = $twig->get_xpath("//a"); foreach my $el (@a_elements) { my $text = $el->text(); # do stuff with $text. }
    It's actually pretty easy, once you know which APIs you want ;-)

    Update: Changed loop variable from $a to $el when reminded by graff. Tks.

      But it would be better not to use "$a" as the name of the iterator variable in the "for" loop, in case "doing stuff" includes using "sort".

        something like $ElementA .= $element->sprint;?

        Thanks Mirod. This worked. I am still trying to get used to this response method. Didnt realise that someone replied ot my old posts. Wish there was a way to notify by email when someone replies to your posts. Later
Re: XML processing
by mrborisguy (Hermit) on Aug 06, 2005 at 03:26 UTC
    sub char_handler { my ($parserInst, $data);
    sub end_Handler { my ($parserInst, $element);
    You may want to include the important  = @_ here as well.


Re: XML processing
by shreya (Novice) on Aug 08, 2005 at 13:50 UTC
    Thanks borisz, Tanktalus, graff and mrborisguy for your comments.

    I did a little bit of reading on XML::Twig before posting. However I really wanted to use XML:Parser instead of XML:Twig.

    I dont have rights to install modules on our company server. i guess for now I am going to run XML::Twig from my own directory since I need to get a demo by EOD.

    Thanks again guys.
Re: XML processing
by shreya (Novice) on Aug 10, 2005 at 22:59 UTC
    I started using XML:Twig. However facing a problem.

    my $twig = XML::Twig->new(twig_handlers => {a'=> \&Test ); # Parse the file $twig->parsefile($XmlFile); # Handler sub routine sub Test { my ($parser, $element) = @_; # This prints element <a> along with its contents to some out +put file $element->print(\*OP); # However What I really want done over here is have element <a> a + long with its subelements and text be copied to another variable ins +tead of being printed on screen }
    Problem: What I really want done over here is have element along with its subelements and text be copied to another variable instead of being printed on screen
    Something like
    $ElementA .= $element->print;
    Pls advice. Thanks,

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://481380]
Approved by kwaping
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2022-11-29 00:29 GMT
Find Nodes?
    Voting Booth?