in reply to Re^4: Sliding window perl program
in thread Sliding window perl program
Even though it is now a bit more complex, I stick to the idea of reading only once each of the two files, because it is usually much more efficient. So, after having read once file 1 and closed it, we need to read file 2 line by line and store in a nested data structure (probably a hash of arrays or a hash of hashes) the information collected. Once reading file 2 is completed, output the content of the data structure.
I am only displaying below the second while loop of my previous code, as there is no need to change the first loop on file 1.
I *think* it should work more or less the way you want, but I cannot currently test that code on my tablet, so there may be a typo or an error here or there, or possibly an off-by-one mistake somewhere, but I think the basic idea should be there and it should be easy to get it straight with just a bit of testing.my $margin = 500; open my $SC, "<", $file2 or die "Error: could not open $file2 $!"; my %result; my $step = 100; while (my $line2 = <$SC>) { my ($id, $val) = split /\t/, $line2; my $val_file1 = $hash{$id}; my ($low, $high) = ( $val_file1 - $margin, $val_file1 + $margin); next unless $val > $low and $val < $high; # value not within range +, just discard it my $delta = int (($val - $low)/$step); # delta : slot number $result{$id}{$delta}++; } close $SC; # now %result has, for each $ID, a frequency distribution by steps of +100 (slots 0 to 9), we just need to extract the data from it. for my $id (keys %result) { my $low = $hash{$id} - $margin; for my $slot (0..9){ my $range = sprintf "%d-%d", $low + $slot * $step, $low + ($sl +ot + 1) * $step; my $frequency = $result{$id}{$slot} // 0; print "ID $id: $range : $frequency \n"; } }
If your "sliding windows" is different from what I have done, it should be just minor changes in the value of the params ($margin, $step) and perhaps a bit more work in the final printing of the results at the end, provided the %result hash has sufficiently detailed information.
|
|---|