UPDATED First, let me apologize: my computer is running slowly because of the program, I've been at it forever, and I'm a little bit frustrated. I got happier this morning when I thought I'd made an improvement in the speed, but now I'm not so sure.

Description of problem
The resulting file 'averages.txt' is incorrect. The averages are wrong. (They are too low) I went back to the 'slow' code (that was tested, and did work), but its not working anymore it seems. I would be VERY grateful if anyone could help me with a solution that works and won't take five weeks. See below for a description of the code.

The above was the problem. HOWEVER, now that I have that under control (the array @avg_frmt is correct under debugging tests) I have an odd problem. Nothing is printing to the averages.txt file. It is being created, but it is blank.

Description of Code The directories have 12 files each (that are of interest to us, i.e., .txt). These files contain 30-60 thousand lines of data, in the following format:
"(time)\t(hrt)\t(skt)\t\(emg)\t(0 or 5)"
(They are measured values that I'm trying to analyze)

Here's an example of a chunk:

0 61.2245 83.129 0.000128174 0 0.000333333 61.2245 83.1305 0.000128174 0 0.000666667 61.2245 83.132 0.000109863 0 0.001 61.2245 83.129 0.000115967 5 0.00133333 61.2245 83.132 0.000115967 5 0.00166667 61.2245 83.1305 0.00012207 5 0.002 61.2245 83.132 0.000115967 5 0.00233333 61.2245 83.132 0.00012207 5 0.00266667 61.2245 83.132 0.000115967 5 0.003 61.2245 83.132 0.00012207 5 0.00333333 61.2245 83.132 0.00012207 5 0.00366667 61.2245 83.1335 0.000134277 5 0.004 61.2245 83.132 0.000140381 5 0.00433333 61.2245 83.1305 0.00012207 5 0.00466667 61.2245 83.132 0.000134277 5 0.005 61.2245 83.132 0.000115967 5 0.00533333 61.2245 83.1335 0.000128174 5 0.00566667 61.2245 83.1335 0.00012207 5 0.006 61.2245 83.132 0.000134277 5 0.00633333 61.2245 83.1351 0.000134277 5
The 0 at the end represents the push of a button (a 5 for no push). It separates the data into two conditions (the first average we want and the second). The zeros at the beginning represent the start, so we treat those just like it were a 5. However, there is a group of zeros in the middle that we are interested in. Take the first line of data, to the first 0 (from the ones in the middle) and average the desired values. THEN, from the last zero (from the ones in the middle) we average until the end.

Files with the name RAREEVENT in them we ignore.

I'd be glad to clarify anything.

#!/usr/bin/perl use strict; use warnings; my $base_dir = 'G:\Test Data'; my @included_dirs = ('Ts1', 'Ts10', 'Ts12', 'Ts13', 'Ts14', 'Ts15', 'T +s16', 'Ts17', 'Ts18', 'Ts19', 'Ts2', 'Ts20', 'Ts21', + 'Ts22', 'Ts23', 'Ts24', 'Ts25', 'Ts26', 'Ts27', 'Ts3', 'Ts4', +'Ts5', 'Ts6', 'Ts7', 'Ts8', 'Ts9'); my @files; for my $dir(@included_dirs) { opendir(DIR, "$base_dir\\$dir") or die "$dir failed to open: $!"; @files = grep { /\.txt$/ } readdir(DIR); closedir(DIR); print "$dir\n"; for my $file(@files) { next if $file =~ /RAREEVENT/; print "\t$file\n"; my $arg1 = $file; my $arg2 = "$base_dir\\$dir"; process_file($arg1,$arg2); } } sub process_file { my $i_file; ## Name of input file my $dir_path; ## Directory path for $i_file my $full_name; ## '$dir_path\$i_file' my $avg_file = ">>averages.txt"; ## Name of file where file average +s are written to (append mode) my $ln_num = 0; ## Line number in current file, used in @zeroat arr +ay to mark zeros my $i = 0; ## Reference counter my $sum1 = 0; ## HRT sum my $sum2 = 0; ## SKT sum my $sum3 = 0; ## EMG sum my $avg11 = 0; ## HRT avg 1 my $avg12 = 0; ## HRT avg 2 my $avg21 = 0; ## SKT avg 1 my $avg22 = 0; ## SKT avg 2 my $avg31 = 0; ## EMG avg 1 my $avg32 = 0; ## EMG avg 2 my @files; ## Array to hold desired filenames for current folder my @avg_frmt; ## Array to hold formatting for $avg_file document (i +.e., the formatted output) my @lines; ## Array to hold lines of current file my @lines1; ## Holds first part to be averaged my @lines2; ## Holds second part to be averaged my @zeroat; ## Tells where zeros are at in array @lines (holds the +line number of the zeros; an index to @lines) $i_file = shift; ## Get file name from @_ $dir_path = shift; ## Get directory path from @_ $full_name = "$dir_path\\$i_file"; open(IN,$full_name) or die "$i_file failed to open: $!"; @lines = <IN>; ## Give file input to @lines close IN; ## Retrieve desired rows for my $curline(@lines) { $curline =~ /.*?\t.*?\t.*?\t.*?\t([05])/; ## parse line $zeroat[$i++] = $ln_num if $1 == 0; $ln_num++; } ## Take Average ## Get all points between the starting and ending points, and separa +te into different arrays LOOP: for my $i(@zeroat) { ## $i is an index in @lines to where a z +ero is at $lines[$i] =~ /(.*?)\t.*?\t.*?\t.*?\t([05])/; ## parse line if ($1 > .5 && $2 == 0) { @lines1 = @lines[0..$i-1]; ## @lines1 equals the first $i-1 ele +ments of @lines @lines2 = @lines[$i+1..$#lines]; ## @lines2 equals everything p +ast the $i+1 element of @lines last LOOP; } ## the zero is in the middle: split for averaging } ## Reset sums $sum1 = 0; $sum2 = 0; $sum3 = 0; for my $i(@lines1) { ## go through first part and average $i =~ /.*?\t(.*?)\t(.*?)\t(.*?)\t[05]/; ## parse line $sum1 += $1; $sum2 += $2; $sum3 += $3; } ## Get first average $avg11 = $sum1/$#lines1; $avg21 = $sum2/$#lines1; $avg31 = $sum3/$#lines1; ## Reset sums $sum1 = 0; $sum2 = 0; $sum3 = 0; for my $i(@lines2) { ## go through second part and average $i =~ /.*?\t(.*?)\t(.*?)\t(.*?)\t[05]/; ## parse line $sum1 += $1; $sum2 += $2; $sum3 += $3; } ## Get second average $avg12 = $sum1/$#lines2; $avg22 = $sum2/$#lines2; $avg32 = $sum3/$#lines2; ## Put averages into tab delimited columns with desired format: File + name followed by tab followed ## by averages; first line is resting condition; second line is clou +d condition. $avg_frmt[0] = "$i_file\t$avg11\t$avg21\t$avg31\n"; ## HRT, SKT, EM +G is the $avg_frmt[1] = "$i_file\t$avg12\t$avg22\t$avg32\n"; ## order for th +e averages ## Open and print averages to $avg_file open(OUT,$avg_file) or die "$avg_file failed to be created: $!"; print OUT @avg_frmt; }

Added closing code tag - dvergin 2002-06-24


In reply to Re: A very odd happening (at least. . . to me) by dimmesdale
in thread Processing large files many times over by dimmesdale

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.