I was surprised to see the $& being the fastest. At least on my perl-5.28:
my $string = "AAATTTAGTTCTTAAGGCTGACATCGGTTTACGTCAGCGTTACCCCCCAAGTTATT
+GGGGACTTT";
my %expect = qw( CCCCCC 1 GGGG 1 AAA 1 TTT 3 AA 2 GG 2 TT 5 );
use Test::More;
use Benchmark qw(cmpthese);
my %subs;
sub v1 {
%subs = ();
$subs{$_}++ for grep { length >= 2 } split m/,/ => ($string =~ s/(
+[ACGT])\K(?!\1)/,/gr);
} # v1
sub v2 {
%subs = ();
$subs{$_}++ for grep m/^([ACGT])\1+$/ => split m/,/ => ($string =~
+ s/(\w)\K(?!\1)/,/gr);
} # v2
sub v3 {
%subs = ();
$subs{$_}++ for $string =~ m/(AA+|CC+|GG+|TT+)/g;
} # v3
sub v4 {
%subs = ();
$subs{$1}++ while $string =~ m{(([ACGT])\2+)}g;
} # v4
sub v5 {
%subs = ();
$subs{$&}++ while $string =~ m{([ACGT])\1+}g;
} # v5
v1 (); is_deeply (\%subs, \%expect, "v1");
v2 (); is_deeply (\%subs, \%expect, "v2");
v3 (); is_deeply (\%subs, \%expect, "v3");
v4 (); is_deeply (\%subs, \%expect, "v4");
v5 (); is_deeply (\%subs, \%expect, "v5");
printf "%5d %3d %s\n", $subs{$_->[1]}, @$_ for sort { $b->[0] <=> $a->
+[0] || $a->[1] cmp $b->[1] } map {[ length, $_ ]} keys %subs;
cmpthese (-2, { v1 => \&v1, v2 => \&v2, v3 => \&v3, v4 => \&v4, v5 =>
+\&v5 });
done_testing;
=>
ok 1 - v1
ok 2 - v2
ok 3 - v3
ok 4 - v4
ok 5 - v5
1 6 CCCCCC
1 4 GGGG
1 3 AAA
3 3 TTT
2 2 AA
2 2 GG
5 2 TT
Rate v2 v1 v3 v4 v5
v2 41981/s -- -30% -52% -56% -57%
v1 59864/s 43% -- -31% -38% -39%
v3 87244/s 108% 46% -- -9% -12%
v4 95919/s 128% 60% 10% -- -3%
v5 98685/s 135% 65% 13% 3% --
1..5
Enjoy, Have FUN! H.Merijn
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.