comment on

UPDATED First, let me apologize: my computer is running slowly because of the program, I've been at it forever, and I'm a little bit frustrated. I got happier this morning when I thought I'd made an improvement in the speed, but now I'm not so sure.

Description of problem
The resulting file 'averages.txt' is incorrect. The averages are wrong. (They are too low) I went back to the 'slow' code (that was tested, and did work), but its not working anymore it seems. I would be VERY grateful if anyone could help me with a solution that works and won't take five weeks. See below for a description of the code.

The above was the problem. HOWEVER, now that I have that under control (the array @avg_frmt is correct under debugging tests) I have an odd problem. Nothing is printing to the averages.txt file. It is being created, but it is blank.

Description of Code The directories have 12 files each (that are of interest to us, i.e., .txt). These files contain 30-60 thousand lines of data, in the following format:
"(time)\t(hrt)\t(skt)\t\(emg)\t(0 or 5)"
(They are measured values that I'm trying to analyze)

Here's an example of a chunk:

0    61.2245    83.129    0.000128174    0    
0.000333333    61.2245    83.1305    0.000128174    0    
0.000666667    61.2245    83.132    0.000109863    0    
0.001    61.2245    83.129    0.000115967    5    
0.00133333    61.2245    83.132    0.000115967    5    
0.00166667    61.2245    83.1305    0.00012207    5    
0.002    61.2245    83.132    0.000115967    5    
0.00233333    61.2245    83.132    0.00012207    5    
0.00266667    61.2245    83.132    0.000115967    5    
0.003    61.2245    83.132    0.00012207    5    
0.00333333    61.2245    83.132    0.00012207    5    
0.00366667    61.2245    83.1335    0.000134277    5    
0.004    61.2245    83.132    0.000140381    5    
0.00433333    61.2245    83.1305    0.00012207    5    
0.00466667    61.2245    83.132    0.000134277    5    
0.005    61.2245    83.132    0.000115967    5    
0.00533333    61.2245    83.1335    0.000128174    5    
0.00566667    61.2245    83.1335    0.00012207    5    
0.006    61.2245    83.132    0.000134277    5    
0.00633333    61.2245    83.1351    0.000134277    5
[download]

The 0 at the end represents the push of a button (a 5 for no push). It separates the data into two conditions (the first average we want and the second). The zeros at the beginning represent the start, so we treat those just like it were a 5. However, there is a group of zeros in the middle that we are interested in. Take the first line of data, to the first 0 (from the ones in the middle) and average the desired values. THEN, from the last zero (from the ones in the middle) we average until the end.

Files with the name RAREEVENT in them we ignore.

I'd be glad to clarify anything.

#!/usr/bin/perl

use strict;
use warnings;

my $base_dir = 'G:\Test Data';
my @included_dirs = ('Ts1', 'Ts10', 'Ts12', 'Ts13', 'Ts14', 'Ts15', 'T
+s16', 'Ts17', 'Ts18', 'Ts19', 'Ts2', 'Ts20', 'Ts21',                 
+       'Ts22', 'Ts23', 'Ts24', 'Ts25', 'Ts26', 'Ts27', 'Ts3', 'Ts4', 
+'Ts5', 'Ts6', 'Ts7', 'Ts8', 'Ts9');
my @files;

for my $dir(@included_dirs) {
  opendir(DIR, "$base_dir\\$dir") or die "$dir failed to open: $!";
  @files = grep { /\.txt$/ } readdir(DIR);
  closedir(DIR);
  print "$dir\n";
  for my $file(@files) {
    next if $file =~ /RAREEVENT/;
    print "\t$file\n";
    my $arg1 = $file;
    my $arg2 = "$base_dir\\$dir";
    process_file($arg1,$arg2);
  }
}


sub process_file {
  my $i_file;  ## Name of input file
  my $dir_path;  ## Directory path for $i_file
  my $full_name;  ## '$dir_path\$i_file'
  my $avg_file = ">>averages.txt";  ## Name of file where file average
+s are written to (append mode)
  my $ln_num = 0;  ## Line number in current file, used in @zeroat arr
+ay to mark zeros
  my $i = 0;  ## Reference counter
  my $sum1 = 0;  ## HRT sum
  my $sum2 = 0;  ## SKT sum
  my $sum3 = 0;  ## EMG sum
  my $avg11 = 0;  ## HRT avg 1
  my $avg12 = 0;  ## HRT avg 2
  my $avg21 = 0;  ## SKT avg 1
  my $avg22 = 0;  ## SKT avg 2
  my $avg31 = 0;  ## EMG avg 1
  my $avg32 = 0;  ## EMG avg 2
  my @files;  ## Array to hold desired filenames for current folder
  my @avg_frmt;  ## Array to hold formatting for $avg_file document (i
+.e., the formatted output)
  my @lines;  ## Array to hold lines of current file
  my @lines1;  ## Holds first part to be averaged
  my @lines2;  ## Holds second part to be averaged
  my @zeroat;  ## Tells where zeros are at in array @lines (holds the 
+line number of the zeros; an index to @lines)

  $i_file = shift;  ## Get file name from @_
  $dir_path = shift;  ## Get directory path from @_
  $full_name = "$dir_path\\$i_file";

  open(IN,$full_name) or die "$i_file failed to open: $!";
  @lines = <IN>;  ## Give file input to @lines
  close IN;

  ## Retrieve desired rows
  for my $curline(@lines) {
    $curline =~ /.*?\t.*?\t.*?\t.*?\t([05])/;  ## parse line
    $zeroat[$i++] = $ln_num if $1 == 0;
    $ln_num++;
  }

  ## Take Average
  ## Get all points between the starting and ending points, and separa
+te into different arrays
  LOOP: for my $i(@zeroat) {  ## $i is an index in @lines to where a z
+ero is at
    $lines[$i] =~ /(.*?)\t.*?\t.*?\t.*?\t([05])/;  ## parse line
    if ($1 > .5 && $2 == 0) { 
      @lines1 = @lines[0..$i-1];  ## @lines1 equals the first $i-1 ele
+ments of @lines
      @lines2 = @lines[$i+1..$#lines];  ## @lines2 equals everything p
+ast the $i+1 element of @lines
      last LOOP;
    } ## the zero is in the middle: split for averaging
  }

  ## Reset sums
  $sum1 = 0;
  $sum2 = 0;
  $sum3 = 0;

  for my $i(@lines1) {  ## go through first part and average
    $i =~ /.*?\t(.*?)\t(.*?)\t(.*?)\t[05]/;  ## parse line
    $sum1 += $1;
    $sum2 += $2;
    $sum3 += $3;
  }

  ## Get first average
  $avg11 = $sum1/$#lines1;
  $avg21 = $sum2/$#lines1;
  $avg31 = $sum3/$#lines1;

  ## Reset sums
  $sum1 = 0;
  $sum2 = 0;
  $sum3 = 0;

  for my $i(@lines2) {  ## go through second part and average
    $i =~ /.*?\t(.*?)\t(.*?)\t(.*?)\t[05]/;  ## parse line
    $sum1 += $1;
    $sum2 += $2;
    $sum3 += $3;
  }

  ## Get second average
  $avg12 = $sum1/$#lines2;
  $avg22 = $sum2/$#lines2;
  $avg32 = $sum3/$#lines2;

  ## Put averages into tab delimited columns with desired format: File
+ name followed by tab followed
  ## by averages; first line is resting condition; second line is clou
+d condition.
  $avg_frmt[0] = "$i_file\t$avg11\t$avg21\t$avg31\n";  ## HRT, SKT, EM
+G is the
  $avg_frmt[1] = "$i_file\t$avg12\t$avg22\t$avg32\n";  ## order for th
+e averages

  ## Open and print averages to $avg_file
  open(OUT,$avg_file) or die "$avg_file failed to be created: $!";
  print OUT @avg_frmt;
}
[download]

Added closing code tag - dvergin 2002-06-24

In reply to Re: A very odd happening (at least. . . to me) by dimmesdale
in thread Processing large files many times over by dimmesdale

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.