in reply to find the coverage of sequence in a particular range
One option is to use a module that implements interval sets. CPAN has many; I can't tell which is to be recommended. Here's an example using Set::IntSpan.
#! /usr/bin/perl use strict; use warnings; use Set::IntSpan; my $seq_len = 150; my (%seqs, %ranges); for (<>) { m/^(\d+)\s+(\d+)\.\.(\d+)/ or next; $seqs{$1} //= [ $1 +0, $1 + $seq_len-1 ]; $ranges{$2,$3} //= [ $2, $3 ]; } my $S = Set::IntSpan->new([ values %seqs ]); printf "Total seq coverage: %d\n", $S->size; for (@ranges{ sort keys %ranges }) { my $window = [ $_->[0], $_->[1] + $seq_len-1 ]; my $T = $S->intersect([ $window ]); printf "range [%d..%d] window [%d..%d] %s covers %d\n", @$_, @$window, $T->run_list, $T->size; } __END__ Total seq coverage: 1251 range [1..1524] window [0..1673] 46-195,403-552,800-981,1008-1157,14 +07-1556 covers 782 range [2052..3260] window [2051..3409] 2360-2509,2967-3135,3170-3319 + covers 469
To me it looks like there's a bug in that module (ver 1.19): documentation promises the operands are not affected, yet intersect method appears to modify $window. Hm.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: find the coverage of sequence in a particular range
by Anonymous Monk on May 23, 2016 at 06:10 UTC | |
by hippo (Archbishop) on May 23, 2016 at 08:03 UTC |