toddgow has asked for the wisdom of the Perl Monks concerning the following question:

I need some help and I am not finding the answer easily for some reason. I have an XML file:

<name> <![CDATA[GT Amadeus]]> </name>

There are multiple <name> and <CDATA> fields in the file. I need the next <CDATA> after the <name>.

This code pulls the <name> ...

#!/usr/bin/perl use strict; my $line = ""; open(XML,"<$file") || dir "Couldn't find $file\n"; foreach $line (<XML>){ chomp($line); $line =~ s/^\s+//g; if ($line =~ /name/{ print "$line\n"; } }

I need to find name then pull the next line...what is the best method to do this?

thanks in advance,

Todd

20090201 Janitored by Corion: Added formatting, code tags, as per Writeup Formatting Tips

Replies are listed 'Best First'.
Re: Find next line
by toolic (Bishop) on Jan 31, 2009 at 01:36 UTC
    Welcome to the Monastery.

    Please edit your node to add "code" tags around your code segments because it is difficult to read. See Writeup Formatting Tips. And it is good form to post the actual code that you are using, i.e., code that compiles (you meant die instead of "dir", etc.).

    Since parsing XML is tricky business, it is a good idea to use one of the CPAN modules to do it for you. There is a learning curve, but it's well worth the effort. I recommend XML::Twig.

    Having said that, one way to get the next line after a matching line is to use a while loop instead of a foreach loop, as follows (untested):

    while (<>) { if (/name/) { print; my $next_line = <>; # do something with $next_line ... } }

    Some more miscellaneous observations:

    • It is good that you use strict;. You might as well toss in use warnings; while you're at it.
    • It is good practice to use lexical filehandles, the 3-argument form of open and print the error message ($!):
      open my $fh, '<', $file or die "can not open $file: $!"; while <$fh> { ... }
    • Since you are anchoring the regex in the substiution (^), there is no reason for the g modifier. This would suffice: s/^\s+//
      Toolic, Thanks so much for taking the time to look at this and the recomendations. Here is a snippet from my XML file:
      <script type="ApplicationPerspective" version="5.3.13.179" recorder="8 +.6.59.276" sav="25" guid="296A95D0-E8B6-4989-AA21-126796A3AD3F" xmlns +="http://www.keynote.com/namespaces/tstp/script"> <name> <![CDATA[GT Amadeus]]> </name> ....... <actions> <action FrameErrorFatal="1" MetaErrorFatal="1"> <name> <![CDATA[Home Page]]> </name> <description> <![CDATA[]]> </description>
      I have started looking at XML:Twig. I need to be able to pull the CDATA in between the <name> tags. Here is my sample XML:Twig code right now:
      #!/usr/bin/perl use XML::Twig; my $file = $ARGV[0]; my $twig= new XML::Twig(TwigRoots => {script/name' => 1}); $twig->parsefile($file); $twig->print;
        Something like this?
        use strict; use warnings; use XML::Twig; my $xmlStr = <<XML; <script type="ApplicationPerspective" version="5.3.13.179" recorder="8 +.6.59.276" sav="25" guid="296A95D0-E8B6-4989-AA21-126796A3AD3F" xmlns +="http://www.keynote.com/namespaces/tstp/script"> <name> <![CDATA[GT Amadeus]]> </name> <actions> <action FrameErrorFatal="1" MetaErrorFatal="1"> <name> <![CDATA[Home Page]]> </name> <description> <![CDATA[]]> </description> </action> </actions> </script> XML my $twig = new XML::Twig( twig_handlers => { name => \&name } ); $twig->parse($xmlStr); sub name { my ($twig, $name) = @_; my $stuff = $name->text(); print "$stuff\n"; } __END__ GT Amadeus Home Page
Re: Find next line
by CountZero (Bishop) on Jan 31, 2009 at 08:26 UTC
    Have you considered using XSLT on your XML file to get what you want?

    Perhaps if you post a (smallish but significant) extract of your XML file and the result you expect, we can give you more guidance.

    In general, XML files have no concept of "next line". Whitespace, linefeeds, ... should be irrelevant for XML (a few exceptions notwithstanding). It is all tags and only tags that really count. Expecting some data to be "on the next line" is not guaranteed to work. It is the format not the form that you have to work with.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James