More of a conceptual problem than a Perl problem per se.
I am parsing the output of a program (non-perl and not editable) that compares two DNA sequences. The results are given as sets of 2-D co-ordinates.
e.g.
where field 1 is the name[ ['NM_144963','2713','4091'], ['NM_144963','1949','2712'] ];
A simple sort allows me to sort the above and calculate the coverage
All well and good for the above case. But parts of my data are overlapping e.g.my @sorted = sort {($a->[1] <=>$b->[1]) || ($a->[2] <=> $b->[2])}@ +array; my $coverage = 0; for (@sorted){ $coverage += $_->[2] - $_->[1] +1; } printf "cover = %.1f%% \n", $coverage/$length *100;
Here my code breaks down and I cannot for the life of me think of a data structure that will allow adequately deal with both cases (i.e a collection of distinct hits, a collection of overlapping hits) or worse a combination of the two.[ ['NM_176827','618','710'], ['NM_176827','621','710'], ['NM_176827','622','692'], ['NM_176827','629','710'] ]
When they overlap I need to pick the longest hit. (i.e. the first array above).
Am I just having a brain metldown today - Can someone point me in the direction of an obvious solution. Thanks a lot
A.A.
In reply to Comparing 2-D co-ordinates by aging acolyte
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |