in reply to A very odd happening (at least. . . to me)
in thread Processing large files many times over
Description of problem
The resulting file 'averages.txt' is incorrect. The averages are wrong. (They are too low) I went back to the 'slow' code (that was tested, and did work), but its not working anymore it seems. I would be VERY grateful if anyone could help me with a solution that works and won't take five weeks. See below for a description of the code.
The above was the problem. HOWEVER, now that I have that under control (the array @avg_frmt is correct under debugging tests) I have an odd problem. Nothing is printing to the averages.txt file. It is being created, but it is blank.
Description of Code
The directories have 12 files each (that are of interest to us, i.e., .txt). These files contain 30-60 thousand lines of data, in the following format:
"(time)\t(hrt)\t(skt)\t\(emg)\t(0 or 5)"
(They are measured values that I'm trying to analyze)
Here's an example of a chunk:
The 0 at the end represents the push of a button (a 5 for no push). It separates the data into two conditions (the first average we want and the second). The zeros at the beginning represent the start, so we treat those just like it were a 5. However, there is a group of zeros in the middle that we are interested in. Take the first line of data, to the first 0 (from the ones in the middle) and average the desired values. THEN, from the last zero (from the ones in the middle) we average until the end.0 61.2245 83.129 0.000128174 0 0.000333333 61.2245 83.1305 0.000128174 0 0.000666667 61.2245 83.132 0.000109863 0 0.001 61.2245 83.129 0.000115967 5 0.00133333 61.2245 83.132 0.000115967 5 0.00166667 61.2245 83.1305 0.00012207 5 0.002 61.2245 83.132 0.000115967 5 0.00233333 61.2245 83.132 0.00012207 5 0.00266667 61.2245 83.132 0.000115967 5 0.003 61.2245 83.132 0.00012207 5 0.00333333 61.2245 83.132 0.00012207 5 0.00366667 61.2245 83.1335 0.000134277 5 0.004 61.2245 83.132 0.000140381 5 0.00433333 61.2245 83.1305 0.00012207 5 0.00466667 61.2245 83.132 0.000134277 5 0.005 61.2245 83.132 0.000115967 5 0.00533333 61.2245 83.1335 0.000128174 5 0.00566667 61.2245 83.1335 0.00012207 5 0.006 61.2245 83.132 0.000134277 5 0.00633333 61.2245 83.1351 0.000134277 5
Files with the name RAREEVENT in them we ignore.
I'd be glad to clarify anything.
#!/usr/bin/perl use strict; use warnings; my $base_dir = 'G:\Test Data'; my @included_dirs = ('Ts1', 'Ts10', 'Ts12', 'Ts13', 'Ts14', 'Ts15', 'T +s16', 'Ts17', 'Ts18', 'Ts19', 'Ts2', 'Ts20', 'Ts21', + 'Ts22', 'Ts23', 'Ts24', 'Ts25', 'Ts26', 'Ts27', 'Ts3', 'Ts4', +'Ts5', 'Ts6', 'Ts7', 'Ts8', 'Ts9'); my @files; for my $dir(@included_dirs) { opendir(DIR, "$base_dir\\$dir") or die "$dir failed to open: $!"; @files = grep { /\.txt$/ } readdir(DIR); closedir(DIR); print "$dir\n"; for my $file(@files) { next if $file =~ /RAREEVENT/; print "\t$file\n"; my $arg1 = $file; my $arg2 = "$base_dir\\$dir"; process_file($arg1,$arg2); } } sub process_file { my $i_file; ## Name of input file my $dir_path; ## Directory path for $i_file my $full_name; ## '$dir_path\$i_file' my $avg_file = ">>averages.txt"; ## Name of file where file average +s are written to (append mode) my $ln_num = 0; ## Line number in current file, used in @zeroat arr +ay to mark zeros my $i = 0; ## Reference counter my $sum1 = 0; ## HRT sum my $sum2 = 0; ## SKT sum my $sum3 = 0; ## EMG sum my $avg11 = 0; ## HRT avg 1 my $avg12 = 0; ## HRT avg 2 my $avg21 = 0; ## SKT avg 1 my $avg22 = 0; ## SKT avg 2 my $avg31 = 0; ## EMG avg 1 my $avg32 = 0; ## EMG avg 2 my @files; ## Array to hold desired filenames for current folder my @avg_frmt; ## Array to hold formatting for $avg_file document (i +.e., the formatted output) my @lines; ## Array to hold lines of current file my @lines1; ## Holds first part to be averaged my @lines2; ## Holds second part to be averaged my @zeroat; ## Tells where zeros are at in array @lines (holds the +line number of the zeros; an index to @lines) $i_file = shift; ## Get file name from @_ $dir_path = shift; ## Get directory path from @_ $full_name = "$dir_path\\$i_file"; open(IN,$full_name) or die "$i_file failed to open: $!"; @lines = <IN>; ## Give file input to @lines close IN; ## Retrieve desired rows for my $curline(@lines) { $curline =~ /.*?\t.*?\t.*?\t.*?\t([05])/; ## parse line $zeroat[$i++] = $ln_num if $1 == 0; $ln_num++; } ## Take Average ## Get all points between the starting and ending points, and separa +te into different arrays LOOP: for my $i(@zeroat) { ## $i is an index in @lines to where a z +ero is at $lines[$i] =~ /(.*?)\t.*?\t.*?\t.*?\t([05])/; ## parse line if ($1 > .5 && $2 == 0) { @lines1 = @lines[0..$i-1]; ## @lines1 equals the first $i-1 ele +ments of @lines @lines2 = @lines[$i+1..$#lines]; ## @lines2 equals everything p +ast the $i+1 element of @lines last LOOP; } ## the zero is in the middle: split for averaging } ## Reset sums $sum1 = 0; $sum2 = 0; $sum3 = 0; for my $i(@lines1) { ## go through first part and average $i =~ /.*?\t(.*?)\t(.*?)\t(.*?)\t[05]/; ## parse line $sum1 += $1; $sum2 += $2; $sum3 += $3; } ## Get first average $avg11 = $sum1/$#lines1; $avg21 = $sum2/$#lines1; $avg31 = $sum3/$#lines1; ## Reset sums $sum1 = 0; $sum2 = 0; $sum3 = 0; for my $i(@lines2) { ## go through second part and average $i =~ /.*?\t(.*?)\t(.*?)\t(.*?)\t[05]/; ## parse line $sum1 += $1; $sum2 += $2; $sum3 += $3; } ## Get second average $avg12 = $sum1/$#lines2; $avg22 = $sum2/$#lines2; $avg32 = $sum3/$#lines2; ## Put averages into tab delimited columns with desired format: File + name followed by tab followed ## by averages; first line is resting condition; second line is clou +d condition. $avg_frmt[0] = "$i_file\t$avg11\t$avg21\t$avg31\n"; ## HRT, SKT, EM +G is the $avg_frmt[1] = "$i_file\t$avg12\t$avg22\t$avg32\n"; ## order for th +e averages ## Open and print averages to $avg_file open(OUT,$avg_file) or die "$avg_file failed to be created: $!"; print OUT @avg_frmt; }
Added closing code tag - dvergin 2002-06-24
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: A very odd happening (at least. . . to me)
by educated_foo (Vicar) on Jun 24, 2002 at 17:17 UTC |