texuser74 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl Monks

I have an input file with the following input

<volume>4</volume> <issue>12</issue> <year>2003</year>
I want to print these values to a new file. But my following script always prints the same value for all the variables. It always prints the last variable's value. i.e year.
open (IN, "<xxx.in"); while(<IN>) { s/<volume>(.*)<\/volume>//;$vol=$1; s/<issue>(.*)<\/issue>//;$iss=$1; s/<year>(.*)<\/year>//;$yr=$1; } close(IN); print "Volume: $vol\n"; print "Issue: $iss\n"; print "Year: $yr\n";
Present Output
Volume: 2003 Issue: 2003 Year: 2003
Required Output:
Volume: 4 Issue: 12 Year: 2003
Please help me in correcting this.

Replies are listed 'Best First'.
Re: problem with variables
by muntfish (Chaplain) on Sep 15, 2004 at 09:45 UTC

    You don't need to do s/// - and you're not checking that the regex has actually matched, before copying $1 into your own variables. In other words, all three of your variables get set to $1 each time round the while loop.

    Try replacing:

    s/<volume>(.*)<\/volume>//;$vol=$1; s/<issue>(.*)<\/issue>//;$iss=$1; s/<year>(.*)<\/year>//;$yr=$1;

    with:

    $vol = $1 if /<volume>(.*)<\/volume>/; $iss = $1 if /<issue>(.*)<\/issue>/; $yr = $1 if /<year>(.*)<\/year>/;

    (Note: untested)


    s^^unp(;75N=&9I<V@`ack(u,^;s|\(.+\`|"$`$'\"$&\"\)"|ee;/m.+h/&&print$&
      Just to elaborate a little further, $1 holds the value of the first capturing parens in the last successful pattern match. It's important to remember that $1 is not reset after an unsuccessful pattern match - in that case it keeps its previous value.

      To get the results texuser74 was getting originally, it seems there was an empty line in xxx.in after the line with the year. When your while loop processed that line, it failed to match <volume>...</volume> but (since you had no if statement), your $vol variable was set to $1 anyway (which was then 2003, as that's the last successful match, from the previous line). It then did the same for the other two variables.
      hi muntfish,

      Thanks a lot, it works now.

Re: problem with variables
by zejames (Hermit) on Sep 15, 2004 at 10:03 UTC
    You seem to use pseudo-xml for your config file, so why not using real xml and module that will make your like easier :

    use XML::Simple; undef $/; $data = <DATA>; $ref = XMLin($data); print "Volume: " . $ref->{'volume'} . "\n"; print "Issue: " . $ref->{'issue'} . "\n"; print "Year: " . $ref->{'year'} . "\n"; __DATA__ <data> <volume>4</volume> <issue>12</issue> <year>2003</year> </data>

    Note that I added a <data> tag around your variables, to make the file xml compliant.

    HTH


    --
    zejames
      Or the xmltwig approach:
      use XML::Twig; my $xml = <DATA>; my $twig = new XML::Twig ( TwigHandlers => { 'volume' => sub { print "Volume: " . @_->text . "\n" }, 'issue' => sub { print "Issue: " . @_->text . "\n" }, 'year' => sub { print "Year: " . @_->text . "\n" }, } ); $twig->parse($xml); __DATA__ <data> <volume>4</volume> <issue>12</issue> <year>2003</year> </data>
      (Update: not tested, sorry)
Re: problem with variables
by gothic_mallard (Pilgrim) on Sep 15, 2004 at 13:17 UTC

    Two more ideas you could try...

    1) Using a hash to store the data:
    #!/usr/bin/perl use strict; my $data=(); open(IN,"xxxx.in"); while(<IN>){ /<(\w+)>(.*)<\/\1>/; $data{$1} = $2; } close(IN); print "Volume: $data{volume}\n"; print "Issue: $data{issue}\n"; print "Year: $data{year}\n";
    2) Similar to the above but skipping the hash to make things a bit tighter:
    #!/usr/bin/perl use strict; open(IN,"xxxx.in"); while(<IN>) { /<(\w+)>(.*)<\/\1>/; print ucfirst($1).": $2\n"; } close(IN);

    The \1 is a back reference to whatever got matched in the first set of <>'s. You don't strictly need it as /<(\w+)>(.*)</ works just as well but it gives it all a nice sense of symmetry :D

    The ucfirst is just there to make the tags neat but naturally you could just write the tags like that in the first place if you desired.

    Both solutions work dynamically so you can change your tag names or add / remove tags without needing to modify your code. They should just as easily handle:

    <volume>4</volume> <issue>12</issue> <year>2003</year> <pages>200</pages> <author>A N Other</author>
    ... which will produce:
    Volume: 4 Issue: 12 Year: 2003 Pages: 200 Author: A N Other