in reply to Hash_of_Hash_Would do it?

I thought I would provide another approach that doesn't use a hash. It processes line by line.

Chris

#!/usr/bin/perl use strict; use warnings; chomp(my @previous = split /\t/, <DATA>); my $from = 1; my $to = $previous[2]-9 > 0 ? $previous[2]-9 : 0; # print '0 values' up to 8 less of margin for my $pos ($from..$to) { print join("\t", $previous[1], $pos, '','','0'), "\n"; } $from = $previous[2]-8 > 0 ? $previous[2]-8 : 1; $to = $previous[2]-1; # print margin of '1 values' up to first record for my $pos ($from .. $to) { print join("\t", $previous[1], $pos, '','','1'), "\n"; } # print first record print join("\t", @previous[1..4], '1'), "\n"; my $pos_count = $previous[2]; while (<DATA>) { chomp; my @current= split /\t/; if ($current[1] eq $previous[1]) { if (++$pos_count != $current[2]) { for my $pos ($pos_count .. $current[2]-1) { print join("\t", $current[1], $pos, '','', '1'), "\n"; } } print join("\t", @current[1..4], 1), "\n"; } else { # print '1 values' for margin of 8 past last pos for previous +record for my $pos ($previous[2] + 1 .. $previous[2] + 8) { print join("\t", $previous[1], $pos, '','','1'), "\n"; } $from = 1; $to = $current[2]-9 > 0 ? $current[2]-9 : 0; # print '0 values' up to 8 less of margin for current record for my $pos ($from .. $to) { print join("\t", $current[1], $pos, '','','0'), "\n"; } $from = $current[2]-8 > 0 ? $current[2]-8 : 1; $to = $current[2]-1; # print '1 values' up to current record for my $pos ($from .. $to) { print join("\t", $current[1], $pos, '','','1'), "\n"; } print join("\t", @current[1..4], 1), "\n"; } $pos_count = $current[2]; @previous = @current; } # after last record printed, print out margin of 8 more for my $pos ($previous[2] + 1 .. $previous[2] + 8) { print join("\t", $previous[1], $pos, '','','1'), "\n"; } __DATA__ CLS_S3_Contig100_st CLS_S3_Contig100 53 10 0.3717 CLS_S3_Contig100_at CLS_S3_Contig100 55 11 0.4321 CLS_S3_Contig100_st CLS_S3_Contig100 57 10 0.3223 CLS_S3_Contig100_at CLS_S3_Contig100 59 11 0.4055 CLS_S3_Contig100_st CLS_S3_Contig100 61 11 0.4511 CLS_S3_Contig100_at CLS_S3_Contig100 63 11 0.474 CLS_S3_Contig10031_st CLS_S3_Contig10031 53 12 0.5548 CLS_S3_Contig10031_st CLS_S3_Contig10031 57 10 0.4871 CLS_S3_Contig10031_st CLS_S3_Contig10031 61 12 0.547 CLSS3627.b1_F19.ab1 CLS_S3_Contig10031 62 11 0.5129 CLSS3627.b1_F19.ab1 CLS_S3_Contig10031 64 11 0.5789
It printed out this solution.

CLS_S3_Contig100 1 0 CLS_S3_Contig100 2 0 CLS_S3_Contig100 3 0 CLS_S3_Contig100 4 0 CLS_S3_Contig100 5 0 CLS_S3_Contig100 6 0 CLS_S3_Contig100 7 0 CLS_S3_Contig100 8 0 CLS_S3_Contig100 9 0 CLS_S3_Contig100 10 0 CLS_S3_Contig100 11 0 CLS_S3_Contig100 12 0 CLS_S3_Contig100 13 0 CLS_S3_Contig100 14 0 CLS_S3_Contig100 15 0 CLS_S3_Contig100 16 0 CLS_S3_Contig100 17 0 CLS_S3_Contig100 18 0 CLS_S3_Contig100 19 0 CLS_S3_Contig100 20 0 CLS_S3_Contig100 21 0 CLS_S3_Contig100 22 0 CLS_S3_Contig100 23 0 CLS_S3_Contig100 24 0 CLS_S3_Contig100 25 0 CLS_S3_Contig100 26 0 CLS_S3_Contig100 27 0 CLS_S3_Contig100 28 0 CLS_S3_Contig100 29 0 CLS_S3_Contig100 30 0 CLS_S3_Contig100 31 0 CLS_S3_Contig100 32 0 CLS_S3_Contig100 33 0 CLS_S3_Contig100 34 0 CLS_S3_Contig100 35 0 CLS_S3_Contig100 36 0 CLS_S3_Contig100 37 0 CLS_S3_Contig100 38 0 CLS_S3_Contig100 39 0 CLS_S3_Contig100 40 0 CLS_S3_Contig100 41 0 CLS_S3_Contig100 42 0 CLS_S3_Contig100 43 0 CLS_S3_Contig100 44 0 CLS_S3_Contig100 45 1 CLS_S3_Contig100 46 1 CLS_S3_Contig100 47 1 CLS_S3_Contig100 48 1 CLS_S3_Contig100 49 1 CLS_S3_Contig100 50 1 CLS_S3_Contig100 51 1 CLS_S3_Contig100 52 1 CLS_S3_Contig100 53 10 0.3717 1 CLS_S3_Contig100 54 1 CLS_S3_Contig100 55 11 0.4321 1 CLS_S3_Contig100 56 1 CLS_S3_Contig100 57 10 0.3223 1 CLS_S3_Contig100 58 1 CLS_S3_Contig100 59 11 0.4055 1 CLS_S3_Contig100 60 1 CLS_S3_Contig100 61 11 0.4511 1 CLS_S3_Contig100 62 1 CLS_S3_Contig100 63 11 0.474 1 CLS_S3_Contig100 64 1 CLS_S3_Contig100 65 1 CLS_S3_Contig100 66 1 CLS_S3_Contig100 67 1 CLS_S3_Contig100 68 1 CLS_S3_Contig100 69 1 CLS_S3_Contig100 70 1 CLS_S3_Contig100 71 1 CLS_S3_Contig10031 1 0 CLS_S3_Contig10031 2 0 CLS_S3_Contig10031 3 0 CLS_S3_Contig10031 4 0 CLS_S3_Contig10031 5 0 CLS_S3_Contig10031 6 0 CLS_S3_Contig10031 7 0 CLS_S3_Contig10031 8 0 CLS_S3_Contig10031 9 0 CLS_S3_Contig10031 10 0 CLS_S3_Contig10031 11 0 CLS_S3_Contig10031 12 0 CLS_S3_Contig10031 13 0 CLS_S3_Contig10031 14 0 CLS_S3_Contig10031 15 0 CLS_S3_Contig10031 16 0 CLS_S3_Contig10031 17 0 CLS_S3_Contig10031 18 0 CLS_S3_Contig10031 19 0 CLS_S3_Contig10031 20 0 CLS_S3_Contig10031 21 0 CLS_S3_Contig10031 22 0 CLS_S3_Contig10031 23 0 CLS_S3_Contig10031 24 0 CLS_S3_Contig10031 25 0 CLS_S3_Contig10031 26 0 CLS_S3_Contig10031 27 0 CLS_S3_Contig10031 28 0 CLS_S3_Contig10031 29 0 CLS_S3_Contig10031 30 0 CLS_S3_Contig10031 31 0 CLS_S3_Contig10031 32 0 CLS_S3_Contig10031 33 0 CLS_S3_Contig10031 34 0 CLS_S3_Contig10031 35 0 CLS_S3_Contig10031 36 0 CLS_S3_Contig10031 37 0 CLS_S3_Contig10031 38 0 CLS_S3_Contig10031 39 0 CLS_S3_Contig10031 40 0 CLS_S3_Contig10031 41 0 CLS_S3_Contig10031 42 0 CLS_S3_Contig10031 43 0 CLS_S3_Contig10031 44 0 CLS_S3_Contig10031 45 1 CLS_S3_Contig10031 46 1 CLS_S3_Contig10031 47 1 CLS_S3_Contig10031 48 1 CLS_S3_Contig10031 49 1 CLS_S3_Contig10031 50 1 CLS_S3_Contig10031 51 1 CLS_S3_Contig10031 52 1 CLS_S3_Contig10031 53 12 0.5548 1 CLS_S3_Contig10031 54 1 CLS_S3_Contig10031 55 1 CLS_S3_Contig10031 56 1 CLS_S3_Contig10031 57 10 0.4871 1 CLS_S3_Contig10031 58 1 CLS_S3_Contig10031 59 1 CLS_S3_Contig10031 60 1 CLS_S3_Contig10031 61 12 0.547 1 CLS_S3_Contig10031 62 11 0.5129 1 CLS_S3_Contig10031 63 1 CLS_S3_Contig10031 64 11 0.5789 1 CLS_S3_Contig10031 65 1 CLS_S3_Contig10031 66 1 CLS_S3_Contig10031 67 1 CLS_S3_Contig10031 68 1 CLS_S3_Contig10031 69 1 CLS_S3_Contig10031 70 1 CLS_S3_Contig10031 71 1 CLS_S3_Contig10031 72 1

Replies are listed 'Best First'.
Re^2: Hash_of_Hash_Would do it?
by sesemin (Beadle) on Sep 16, 2008 at 01:03 UTC
    Hi Chris,

    Thanks for your time and solution. I ran your code and it works great as long as the gap in PIP is not large. In another word if PIP jumps from 240 to 280, it fills all the new col with "1" whereas we want it fill up to 248 from one margin and 272 from the other margin.

    for my $pos ($previous[2] + 1 .. $previous[2] + 8) { print join("\t", $previous[1], $pos, '','','1'), "\n"; }
    is this the place to control the loop. Not pass more than 8?

    Thank you again.

    Pedro

Re^2: Hash_of_Hash_Would do it?
by Cristoforo (Curate) on Sep 16, 2008 at 23:18 UTC
    You would replace
    if (++$pos_count != $current[2]) { for my $pos ($pos_count .. $current[2]-1) { print join("\t", $current[1], $pos, '','', '1'), "\n"; } }

    with

    if (++$pos_count != $current[2]) { fill_interval($pos_count, @current); }

    where fill_interval() is defined as

    sub fill_interval { my ($pos_count, @current) = @_; my $margin = 8; if ($current[2] - $pos_count <= 2*$margin) { for my $pos ($pos_count .. $current[2]-1) { print join("\t", $current[1], $pos, '','', '1'), "\n"; } } else { my @bool; my ($start, $end) = ($pos_count, $current[2]-1); for my $i (0..$margin-1) { @bool[ $start + $i, $end - $i ] = (1,1); } for my $pos ($pos_count .. $current[2]-1) { print join("\t", $current[1], $pos, '','', $bool[$pos] || +0), "\n"; } } }

    Update: Changed literal values to $margin.

      Thank you very much Chris, It is perfect now. A+++