Reading column data from a file

pattobw has asked for the wisdom of the Perl Monks concerning the following question:

Here is my perl file:

#!usr/bin/perl
#
#
$verno = 1.0;
$max_iter = 4;
$max_y_locations = $max_iter;
$y_iter_max = 0;
$x_iter_max = 0;  # Note:0 counts as one iteration
$z_iter_max = 0;
#
$no_energy_bins = 97;                                # 98 bins - [0:97
+]
@spec_cps =  [$no_energy_bins],[$max_y_locations];
@sigma_cps = [$no_energy_bins],[$max_y_locations];
$spec_cps [$no_energy_bins] [$max_y_locations] = 0.0;
$sigma_cps[$no_energy_bins] [$max_y_locations] = 0.0;
#
.
.
.
.
#
do {
    $y_iter_no = $y_iter_no + 1;
    $new_y = $y_old + $del_y * ($y_iter_no - 1);
    $x_iter_no = 0;
     open (OUT, "<$out_file") or die "Cannot open $out_file for readin
+g: $+!";
       while (<OUT>) {
         next unless $_;
         if ($_ && /1tally\s\s(\d+)\s+nps\s=/i) {
          for (my $ii = 0; $ii < 5; $ii++) {
           }
         }
         for (my $ij = 0; $ij < 98; $ij++) {
         if ($_ && /\s\s\s(\d+).(\d+)(\D+)(\d+)\s\s\s(\d+).(\d+)(\D+)(
+\d+)\s(\d+).(\d+)/i) {
          $energy_value = $1.$dot.$2.$3.$4;
          $detector_value = $5.$dot.$6.$7.$8;
          $std_deviation = $9.$dot.$10;
          $kebin=$ij;
          print "kebin=$kebin\n";
          print "energy=$energy_value\n";
          print "detvalue=$detector_value\n";
          print "sigma=$std_deviation\n";
          $energy_bin[$kebin] = $energy_value;
          $spec_cps[$kebin][$y_iter_no] = $detector_value;
          $sigma_cps[$kebin][$y_iter_no] = $std_deviation;
          }
         }
   }
#
     close (OUT);
#
     print "y=$new_y\n";
     print "z=$new_z\n";
     print "x=$new_x\n";
     print "kount=$kount\n";
     $kount = $kount + 1;
#
#    printf PLOUT ("%.3f  %.5e %.5f \n",$new_y,$detector_value,$std_de
+viation,);
#
} until $z_iter_no > $z_iter_max;
#
} until $x_iter_no > $x_iter_max;
#
} until $y_iter_no > $y_iter_max;
#
    for ( my $i = 0; $i < 97; $i++) {
          for ( my $j = 0; $j < $max_y_locations; $j++ ) {
               printf PLOUT ("%.3f %.5e %.5e \n",
#              printf PLOUT (" \n",
                      $energy_bin[$i],$spec_cps[$i],[$j],$sigma_cps[$i
+],[$j] ) ;
            }
    }
#
     close (PLOUT);
[download]

Here is the data file I am trying to read:

.
.
.
1tally  18        nps =     3785697
           tally type 8    pulse height distribution.                 
+  units   number
           tally for  photons

 cell  3
      energy
    0.0000E+00   0.00000E+00 0.0000
    1.0000E-05   2.64152E-07 1.0000
    5.0000E-02   0.00000E+00 0.0000
    6.0000E-02   0.00000E+00 0.0000
    7.0000E-02   0.00000E+00 0.0000
    8.0000E-02   0.00000E+00 0.0000
    9.0000E-02   0.00000E+00 0.0000
    1.0000E-01   0.00000E+00 0.0000
.
.
.
    9.7000E-01   0.00000E+00 0.0000
    9.8000E-01   0.00000E+00 0.0000
    9.9000E-01   0.00000E+00 0.0000
    1.0000E+00   0.00000E+00 0.0000
      total      5.28304E-07 0.7071
[download]

So I am trying to read the three columns after energy and before total.
I did this by locating the 1tally line and skipping 5 lines, or so I thought.
What the %&&!^^@@* am I doing wrong?
I know I am making this a lot harder than necessary.

The print statements are just for debugging.
The portion of code of interest in between the open out and close out statements.
Thanks in advance for any tips,
Bruce

Edit: g0n - readmore tags

Comment on Reading column data from a file Select or Download Code

Replies are listed 'Best First'.
Re: Reading column data from a file by duff (Parson) on Feb 09, 2006 at 21:48 UTC
How about this (untested): `my $energy; while (<FILE>) { last if /^\s+total/; unless ($energy) { $energy = /^\s+energy/; next } my @vals = split; # ... other stuff here }` [download] duff	[reply] [d/l]
Re: Reading column data from a file by Scott7477 (Chaplain) on Feb 10, 2006 at 00:27 UTC
Why not eliminate this line of code: `if ($_ && /1tally\s\s(\d+)\s+nps\s=/i) {` and the corresponding ending curly brace below and just have the for loop cycle through the first five lines doing nothing with those lines?	[reply] [d/l]
Re: Reading column data from a file by Util (Priest) on Feb 10, 2006 at 16:13 UTC
I think you intended to burn off the lines as Scott7477 said; you are just missing the code to read the lines. Try this: `if ( /1tally\s\s(\d+)\s+nps\s=/i ) { foreach ( 1 .. 5 ) { my $junk = <OUT>; } next; }` [download] Similarly, inside the "`for (my $ij = 0; $ij < 98; $ij++) {...}`" loop, you do not read any lines into `$_`, so you reprocess the same line 98 times. Add the line "`$_ = <OUT>;`" inside the loop, or use this next technique to keep from having to know the number of data lines ahead of time. Another technique: When I want to skip the first part of a file, and the last line to be skipped can be matched by a regex, my preferred idiom is to use a skipping while-loop followed by a processing while-loop. You can also exit the processing while-loop if the file has a footer you need to skip. For example: `while (<OUT>) { last if /^\s+energy\n$/; } while (<OUT>) { last if /^\s+total\s+\d/; /\s\s\s(\d+)\.(\d+)(\D+)(\d+)\s\s\s(\d+)\.(\d+)(\D+)(\d+)\s(\d+)\.(\ +d+)/ or warn "This line did not match the pattern: '$_'" and next; $energy_value = $1.$dot.$2.$3.$4; #... }` [download] Other problems in your code: Your first line reads `#!usr/bin/perl`. There should be a slash between the exclamation mark and 'usr'. You don't have to use the filehandle `OUT` just because your filename is in a variable named `$out_file`. Change it to something like `IN`, or you will confuse your future maintenance programmers. All your tests to see if `$_` is true, such as `next unless $_;` will always return true due to the trailing newline. Try this instead: `next unless /\S/;` In the pattern `/\s\s\s(\d+).(\d+)(\D+)(\d+)\s\s\s(\d+).(\d+)(\D+)(\d+)\s(\d+).(\d+)/i`, you do not need the "ignore-case" modifier, but you do need to back-whack your periods, and you probably should anchor the pattern at the beginning and the end. Corrected: `/^\s\s\s\s(\d+)\.(\d+)(\D+)(\d+)\s\s\s(\d+)\.(\d+)(\D+)(\d+)\s(\d+)\.(\d+)\s$/` That pattern, and the next three lines, could be simplified to this: `my ($energy_value, $detector_value, $std_deviation) = /^\s\s\s\s(\d+\.\d+E[-+]\d+)\s\s\s(\d+\.\d+E[-+]\d+)\s(\d+\.\d+)\s$/;` You have no way to distinguish between a line you should skip, and a line you should have processed but failed to due to some possible problem with your pattern; such lines will be silently omitted, and incorrect output will appear to be correct with no indication of the problem. See "another technique" above for a method that offers tighter control. You are not using `strict` or `warnings`. You will come to regret this!	[reply] [d/l] [select]