in reply to Help the counting!

my %data; while (<DATA>) { my ($key, $from, $end) = split; if (!$data {$key}) { $data {$key} = [[$from, $end, 1]]; } else { my @new; my $count = 1; foreach my $segment (@{$data{$key}}) { if ($from < $$segment[1] && $end > $$segment[0]) { # # Must overlap, so merge # $from = $$segment[0] if $$segment[0] < $from; $end = $$segment[1] if $$segment[1] > $end; $count += $$segment[2]; } else { # # No overlap. Keep. # push @new, $segment; } } push @new, [$from, $end, $count]; $data{$key} = [sort {$$a[0] <=> $$b[0]} @new]; } } foreach my $key (sort keys %data) { my @segments = @{$data{$key}}; for (my $i = 0; $i < @segments; $i++) { printf "ID_%d %s %d %d %d\n", $i + 1, $key, @{$segmen +ts[$i]}; } } __DATA__ chr1 101 105 X X - chr1 102 108 X X - chr1 106 111 X X - chr1 112 113 X X - chr1 113 115 X X - chr2 114 118 X X - chr2 119 121 X X - chr2 120 123 X X - chr3 125 130 X X - chr3 131 132 X X -
As output, I get:
ID_1 chr1 101 111 3 ID_2 chr1 112 113 1 ID_3 chr1 113 115 1 ID_1 chr2 114 118 1 ID_2 chr2 119 123 2 ID_1 chr3 125 130 1 ID_2 chr3 131 132 1
Note this has two ranges from chr3; not 1. I don't see anything spanning the gap from 130 to 131 in your input.

Replies are listed 'Best First'.
Re^2: Counting problem!
by g-alone (Initiate) on Mar 22, 2012 at 18:48 UTC
    Thank you so much for your help, There is another question that I would like to know if I want to get out put with more details what should I do, cause In some part of my data I have similar coordinates which make problem with script they look like :
    __DATA__ chr1 101 105 X X - chr1 101 105 X X - chr1 101 105 X X - chr1 101 105 X X - chr1 102 108 X X - chr1 106 111 X X - chr1 106 111 X X - chr1 112 113 X X - chr1 113 115 X X - chr2 114 118 X X - chr2 114 118 X X - chr2 114 118 X X - chr2 114 118 X X - chr2 119 121 X X - chr2 120 123 X X - chr3 125 130 X X - chr3 125 130 X X - chr3 131 132 X X - then I want to get output like this : output : ID_1 chr1 101 105 4 ID_1 chr1 102 106 1 ID_1 chr1 106 111 2 ID_2 chr1 112 113 1 ID_3 chr1 113 115 1 ID_1 chr2 114 118 1 ID_2 chr2 119 123 1 ID_2 chr2 119 123 1 ID_1 chr3 125 130 2 ID_2 chr3 131 132 1
    which it counts the number of similar coordinates as well and show them individually instead of total counting by merging them.
      I tried to make sense of your output, but failed. Probably the fourth column of the second row was typoed, and should read 108. But I've no idea why the first three output lines all have the same value in the first column.

      However, to count, that's trivial. Make a hash key from the primary key of your data, and just, uhm, count. You know, add one each time you see the same thing:

      $counts{$key,$start,$end}++;