Re: Counting problem!

my %data;
while (<DATA>) {
    my ($key, $from, $end) = split;
    if (!$data {$key}) {
        $data {$key} = [[$from, $end, 1]];
    }
    else {
        my @new;
        my $count = 1;
        foreach my $segment (@{$data{$key}}) { 
            if ($from < $$segment[1] && $end > $$segment[0]) { 
                #
                # Must overlap, so merge
                #
                $from = $$segment[0] if $$segment[0] < $from;
                $end = $$segment[1] if $$segment[1] > $end; 
                $count += $$segment[2];
            }
            else {
                #
                # No overlap. Keep.
                #
                push @new, $segment;  
            }
        }
        push @new, [$from, $end, $count];  

        $data{$key} = [sort {$$a[0] <=> $$b[0]} @new];
    }
}

foreach my $key (sort keys %data) {
    my @segments = @{$data{$key}};
    for (my $i = 0; $i < @segments; $i++) {
        printf "ID_%d    %s   %d   %d   %d\n", $i + 1, $key, @{$segmen
+ts[$i]};
    }
}
    
__DATA__
chr1    101  105   X     X     -
chr1    102  108   X     X     -
chr1    106  111   X     X     -
chr1    112  113   X     X     -
chr1    113  115   X     X     -
chr2    114  118   X     X     -
chr2    119  121   X     X     -
chr2    120  123   X     X    -
chr3    125  130   X     X    -
chr3    131  132   X     X   -
[download]

As output, I get:

ID_1    chr1   101   111   3
ID_2    chr1   112   113   1
ID_3    chr1   113   115   1
ID_1    chr2   114   118   1
ID_2    chr2   119   123   2
ID_1    chr3   125   130   1
ID_2    chr3   131   132   1
[download]

Note this has two ranges from chr3; not 1. I don't see anything spanning the gap from 130 to 131 in your input.

Comment on Re: Counting problem! Select or Download Code

Replies are listed 'Best First'.
Re^2: Counting problem! by g-alone (Initiate) on Mar 22, 2012 at 18:48 UTC
Thank you so much for your help, There is another question that I would like to know if I want to get out put with more details what should I do, cause In some part of my data I have similar coordinates which make problem with script they look like : __DATA__ chr1 101 105 X X - chr1 101 105 X X - chr1 101 105 X X - chr1 101 105 X X - chr1 102 108 X X - chr1 106 111 X X - chr1 106 111 X X - chr1 112 113 X X - chr1 113 115 X X - chr2 114 118 X X - chr2 114 118 X X - chr2 114 118 X X - chr2 114 118 X X - chr2 119 121 X X - chr2 120 123 X X - chr3 125 130 X X - chr3 125 130 X X - chr3 131 132 X X - then I want to get output like this : output : ID_1 chr1 101 105 4 ID_1 chr1 102 106 1 ID_1 chr1 106 111 2 ID_2 chr1 112 113 1 ID_3 chr1 113 115 1 ID_1 chr2 114 118 1 ID_2 chr2 119 123 1 ID_2 chr2 119 123 1 ID_1 chr3 125 130 2 ID_2 chr3 131 132 1 [download] which it counts the number of similar coordinates as well and show them individually instead of total counting by merging them.	[reply] [d/l]
Re^3: Counting problem! by JavaFan (Canon) on Mar 22, 2012 at 19:35 UTC
I tried to make sense of your output, but failed. Probably the fourth column of the second row was typoed, and should read 108. But I've no idea why the first three output lines all have the same value in the first column. However, to count, that's trivial. Make a hash key from the primary key of your data, and just, uhm, count. You know, add one each time you see the same thing: `$counts{$key,$start,$end}++;` [download]	[reply] [d/l]