comment on

Consider this pseudo-solution, to be adapted with your exact requirement, not fully available as of this writing.

Even though it is now a bit more complex, I stick to the idea of reading only once each of the two files, because it is usually much more efficient. So, after having read once file 1 and closed it, we need to read file 2 line by line and store in a nested data structure (probably a hash of arrays or a hash of hashes) the information collected. Once reading file 2 is completed, output the content of the data structure.

I am only displaying below the second while loop of my previous code, as there is no need to change the first loop on file 1.


my $margin = 500;
open my $SC, "<", $file2 or die "Error: could not open $file2 $!";
my %result;
my $step = 100;
while (my $line2 = <$SC>) 
{
    my ($id, $val) = split /\t/, $line2;
    my $val_file1 = $hash{$id};
    my ($low, $high) = ( $val_file1 - $margin, $val_file1 + $margin);
    next unless $val > $low and $val < $high; # value not within range
+, just discard it
    my $delta = int (($val - $low)/$step); # delta : slot number
    $result{$id}{$delta}++;
}
close $SC;

# now %result has, for each $ID, a frequency distribution by steps of 
+100 (slots 0 to 9), we just need to extract the data from it.
for my $id (keys %result) {
    my $low = $hash{$id} - $margin;
    for my $slot (0..9){
        my $range = sprintf "%d-%d", $low + $slot * $step, $low + ($sl
+ot + 1) * $step;
        my $frequency = $result{$id}{$slot} // 0;
        print "ID $id: $range : $frequency \n";
    }
}
[download]

I *think* it should work more or less the way you want, but I cannot currently test that code on my tablet, so there may be a typo or an error here or there, or possibly an off-by-one mistake somewhere, but I think the basic idea should be there and it should be easy to get it straight with just a bit of testing.

If your "sliding windows" is different from what I have done, it should be just minor changes in the value of the params ($margin, $step) and perhaps a bit more work in the final printing of the results at the end, provided the %result hash has sufficiently detailed information.

In reply to Re^5: Sliding window perl program by Laurent_R
in thread Sliding window perl program by genome

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.