Win8 Strawberry 5.8.9.5 (32) Sun 09/27/2020 14:19:34 C:\@Work\Perl\monks >perl use strict; use warnings; use Data::Dump qw(dd); my $string = 'ACGTAAAAATGCCCATGGGGGGG'; my @repeats = do { my $p; grep { $p = !$p } $string =~ m{ ((.) \2+) }xmsg; }; dd \@repeats; __END__ ["AAAAA", "CCC", "GGGGGGG"]

Update 1: But you also want lengths:

Win8 Strawberry 5.8.9.5 (32) Sun 09/27/2020 14:20:42 C:\@Work\Perl\monks >perl use strict; use warnings; use Data::Dump qw(dd); my $string = 'ACGTAAAAATGCCCATGGGGGGG'; my @repeats_and_lengths = do { my $p; map [ $_, length ], grep { $p = !$p } $string =~ m{ ((.) \2+) }xmsg; }; dd \@repeats_and_lengths; __END__ [["AAAAA", 5], ["CCC", 3], ["GGGGGGG", 7]]
You already know how to sort this. :)

Update 2:

... there are statements in the while loop that look doubtful ...
Other than the useless /g modifier on the /.../g regex, | oops... not useless! I don't see anything objectionable. There are usually several ways to do anything and which is "best" is often a question of taste — unless you're Benchmark-ing.
... the idea of using an array to store the substring along with its length might not be good.
Again, I see nothing to gripe about. It's a matter of taste and the best impedance match to the rest of the code.

Update 3: Oh, and one more thing... If you're doing a buncha matching operations on a buncha long sequences, it might be useful to add a validation step for each input sequence to be sure it consists only in [ATCG] characters before any further matching operations are done. This allows you to match with . (dot) and know that you can only be matching a valid base character. This might save significant time over many matches, but this can only be determined for sure by benchmarking. (I'd be inclined to add a validation step anyway just to be sure your data really is what you think it is.)


Give a man a fish:  <%-{-{-{-<


In reply to Re: substrings that consist of repeating characters (updated x3) by AnomalousMonk
in thread substrings that consist of repeating characters by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.