parsing xml

pinnacle has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to remove tag "INFO" and it's content from the file and print the remaining file but can't get result, I am not sure what going wrong please help!!


  <COMPLETE>
    <T>test</T>
    <L>light</L>
    <INFO>information</INFO>
  </COMPLETE>

   <COMPLETE>
     <T>test</T>
     <L>light</L>
     <INFO>informa</INFO>
  </COMPLETE>

Above xml is in file 'test.xml'

 
   open(OUT,"/home/test.xml");
    while(<OUT>){
               $line = $_;
               if($line =~ m#<INFO>(.+?)</INFO>#ig) {
               next;
       }
               print "$line\n";
       }
[download]

When I run the above code I only get:



<COMPLETE>
    <T>test</T>
    <L>light</L>
    <INFO>information</INFO>
  </COMPLETE>
[download]

Comment on parsing xml Select or Download Code

Replies are listed 'Best First'.
Re: parsing xml by toolic (Bishop) on Apr 06, 2011 at 18:39 UTC
If you're open to another way of parsing XML, XML::Twig is a good choice. I added a top-level element to your XML snippet: `use warnings; use strict; use XML::Twig; my $xmlstr = <<EOF; <top> <COMPLETE> <T>test</T> <L>light</L> <INFO>information</INFO> </COMPLETE> <COMPLETE> <T>test</T> <L>light</L> <INFO>informa</INFO> </COMPLETE> </top> EOF my $twig = XML::Twig->new( twig_handlers => {INFO => sub {$_->delete()}}, pretty_print => 'indented' ); $twig->parse($xmlstr); $twig->print(); __END__ <top> <COMPLETE> <T>test</T> <L>light</L> </COMPLETE> <COMPLETE> <T>test</T> <L>light</L> </COMPLETE> </top>` [download]	[reply] [d/l]
Re^2: parsing xml by mirod (Canon) on Apr 07, 2011 at 07:37 UTC
To filter out parts of the XML, I usually use a combination of `twig_roots` on the bits I want to skip, and `twig_print_outside_roots` to output the rest of the input. The only problem with this is that if the XML is indented the way the example is, it leaves empty lines where the discarded part was. I'll have to figure something out to deal with this. `#!/usr/bin/perl use strict; use warnings; use XML::Twig; my $xmlstr = <<EOF; <top> <COMPLETE> <T>test</T> <L>light</L> <INFO>information</INFO> </COMPLETE> <COMPLETE> <T>test</T> <L>light</L> <INFO>informa</INFO> </COMPLETE> </top> EOF my $twig = XML::Twig->new( twig_roots => {INFO => 1}, twig_print_outside_roots => 1, ); $twig->parse($xmlstr); __END__ <top> <COMPLETE> <T>test</T> <L>light</L> </COMPLETE> <COMPLETE> <T>test</T> <L>light</L> </COMPLETE> </top>` [download]	[reply] [d/l]
Re: parsing xml by wind (Priest) on Apr 06, 2011 at 18:35 UTC
Take off the 'g' modifier. Otherwise, your code is fine: `# open my $fh, '/home/test.xml' or die $!; my $fh = \DATA; while (<$fh>) { next if m{<INFO>(.?)</INFO>}i; print; } __DATA__ <COMPLETE> <T>test</T> <L>light</L> <INFO>information</INFO> </COMPLETE> <COMPLETE> <T>test</T> <L>light</L> <INFO>informa</INFO> </COMPLETE>` [download] Would be better if your used an xml parser like XML::Twig though.	[reply] [d/l]
Re: parsing xml by Jenda (Abbot) on Apr 07, 2011 at 13:24 UTC
And then ... two years from now ... someone puts two lines of text into the <INFO> ... Jenda Enoch was right! Enjoy the last years of Rome.	[reply]
Re: parsing xml by locked_user sundialsvc4 (Abbot) on Apr 07, 2011 at 15:30 UTC
What I would suggest is ... “if it is XML, then treat it stem-to-stern as XML.” Parse it using a tool like XML::Twig, and use XPath expressions to (effortlessly...) locate all of the `<INFO>` tags. Remove the nodes, then transform back into text for printing. Although this might sound like “extra work,” IMHO it really isn’t, because it thoroughly solves the problem, both in the short-run and in the future. And it does so by pushing the hard work onto the backs of CPAN modules.
Re: parsing xml by perl_addict (Initiate) on Apr 07, 2011 at 05:46 UTC
I just tried following code and it works for me:- `open FH, "test.xml"; while (<FH>) { next if ($_ =~ /\<INFO/ig); print $_; }` [download]	[reply] [d/l]