Hi,
This is a working code which try to remove strings that
contain poly-ATCG, i.e. remove the strings when the composition
of A or T or C or G is greater than a threshold. So with the array below it'll return only "ATCGAT".
My code below although gives the correct result, somehow
I feel it's clumsy and slow. Typically it needs to handle
array of of size thousands to tens of thousands. I wonder how would the venerable monks here would make it more efficient and compact.
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my @set = qw (AAAAAT ATCGAT TTTTTG GCCCCC GTGGGG);
my $lim = 0.75;
my @sel = remove_poly( \@set, $lim);
print "BEFORE:",scalar(@set),"\n";
print "AFTER:",scalar(@sel),"\n";
#print Dumper \@sel;
sub remove_poly
{
my ($array,$lim) = @_;
my $len = length $array->[0];
my @sel_array;
foreach ( @{$array} )
{
my $a_count = $_ =~ tr/A//;
my $t_count = $_ =~ tr/T//;
my $c_count = $_ =~ tr/C//;
my $g_count = $_ =~ tr/G//;
my $a_portion = $a_count/$len;
my $t_portion = $t_count/$len;
my $c_portion = $c_count/$len;
my $g_portion = $g_count/$len;
#print "$_ $a_portion $t_portion $c_portion $g_portion \n";
if ( $a_portion < $lim && $t_portion < $lim && $c_portion < $li
+m && $g_portion < $lim )
{
push @sel_array,$_;
}
else
{
print "$_\n";
next;
}
}
#print Dumper \@sel_array ;
return @sel_array;
}
Update: Benchmark
Thanks so much guys. It's been a great learning experience,
as always.
Rate limbic ewi fang roy auk1 brs_auk2 jdhed
limbic 4029/s -- -58% -64% -67% -76% -77% -85%
ewi 9693/s 141% -- -14% -21% -42% -45% -64%
fang 11261/s 180% 16% -- -8% -32% -37% -58%
roy 12211/s 203% 26% 8% -- -27% -31% -55%
auk1 16620/s 313% 71% 48% 36% -- -6% -38%
brs_auk2 17743/s 340% 83% 58% 45% 7% -- -34%
jdhed 27022/s 571% 179% 140% 121% 63% 52% --
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.