Alright, now I understand what you want. I think I can come up with a simple solution a bit later (no time right now), but there seems to be still an inconsistency in your description of the requirement.
In paragraph 2, your ranges are 150-250, 200-300, 250-350..., i.e. with an increment of 50 and an overlap between successive ranges.
In the expected outcome you show, your ranges are 150-250, 250-350, 350-450..., i.e. an increment of 100 and no overlap.
Can you please clarify?
| [reply] |
Consider this pseudo-solution, to be adapted with your exact requirement, not fully available as of this writing.
Even though it is now a bit more complex, I stick to the idea of reading only once each of the two files, because it is usually much more efficient. So, after having read once file 1 and closed it, we need to read file 2 line by line and store in a nested data structure (probably a hash of arrays or a hash of hashes) the information collected. Once reading file 2 is completed, output the content of the data structure.
I am only displaying below the second while loop of my previous code, as there is no need to change the first loop on file 1.
my $margin = 500;
open my $SC, "<", $file2 or die "Error: could not open $file2 $!";
my %result;
my $step = 100;
while (my $line2 = <$SC>)
{
my ($id, $val) = split /\t/, $line2;
my $val_file1 = $hash{$id};
my ($low, $high) = ( $val_file1 - $margin, $val_file1 + $margin);
next unless $val > $low and $val < $high; # value not within range
+, just discard it
my $delta = int (($val - $low)/$step); # delta : slot number
$result{$id}{$delta}++;
}
close $SC;
# now %result has, for each $ID, a frequency distribution by steps of
+100 (slots 0 to 9), we just need to extract the data from it.
for my $id (keys %result) {
my $low = $hash{$id} - $margin;
for my $slot (0..9){
my $range = sprintf "%d-%d", $low + $slot * $step, $low + ($sl
+ot + 1) * $step;
my $frequency = $result{$id}{$slot} // 0;
print "ID $id: $range : $frequency \n";
}
}
I *think* it should work more or less the way you want, but I cannot currently test that code on my tablet, so there may be a typo or an error here or there, or possibly an off-by-one mistake somewhere, but I think the basic idea should be there and it should be easy to get it straight with just a bit of testing.
If your "sliding windows" is different from what I have done, it should be just minor changes in the value of the params ($margin, $step) and perhaps a bit more work in the final printing of the results at the end, provided the %result hash has sufficiently detailed information.
| [reply] [d/l] |