Win8 Strawberry 5.8.9.5 (32) Sun 09/27/2020 14:19:34
C:\@Work\Perl\monks
>perl
use strict;
use warnings;
use Data::Dump qw(dd);
my $string = 'ACGTAAAAATGCCCATGGGGGGG';
my @repeats = do {
my $p;
grep { $p = !$p } $string =~ m{ ((.) \2+) }xmsg;
};
dd \@repeats;
__END__
["AAAAA", "CCC", "GGGGGGG"]
Update 1: But you also want lengths:
Win8 Strawberry 5.8.9.5 (32) Sun 09/27/2020 14:20:42
C:\@Work\Perl\monks
>perl
use strict;
use warnings;
use Data::Dump qw(dd);
my $string = 'ACGTAAAAATGCCCATGGGGGGG';
my @repeats_and_lengths = do {
my $p;
map [ $_, length ],
grep { $p = !$p } $string =~ m{ ((.) \2+) }xmsg;
};
dd \@repeats_and_lengths;
__END__
[["AAAAA", 5], ["CCC", 3], ["GGGGGGG", 7]]
You already know how to sort this. :)
Update 2:
... there are statements in the while loop that look doubtful ...
Other than the useless /g modifier on the /.../g regex, | oops... not useless!
I don't see anything objectionable. There are usually several ways
to do anything and which is "best" is often a question of taste
— unless you're
Benchmark-ing.
... the idea of using an array to store the
substring along with its length might not be good.
Again, I see nothing to gripe about. It's a matter of taste and the
best impedance match to the rest of the code.
Update 3:
Oh, and one more thing... If you're doing a buncha matching
operations on a buncha long sequences, it might be useful to add a
validation step for each input sequence to be sure it consists only
in [ATCG] characters before any further matching operations
are done. This allows you to match with . (dot) and know that
you can only be matching a valid base character. This might
save significant time over many matches, but this can only be
determined for sure by benchmarking. (I'd be inclined to add a
validation step anyway just to be sure your data really is what you
think it is.)
Give a man a fish: <%-{-{-{-<
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.