http://qs1969.pair.com?node_id=11122295


in reply to substrings that consist of repeating characters

Just in case you need offsets as well, here's a solution for that.

use strict; use warnings; use feature qw{ say }; my $string = q{AAATTTAGTTCTTAAGGCTGACATCGGTTTACGTCAGCGTTACCCCCCAAGTTATTGGGGACT +TT}; my @matches; push @matches, [ length $1, $1, $-[ 0 ] ] while $string =~ m{(([ACGT])\2+)}g; say qq{Found $_->[ 1 ], length $_->[ 0 ] at offset $_->[ 2 ]} for sort { $b->[ 0 ] <=> $a->[ 0 ] || $a->[ 1 ] cmp $b->[ 1 ] || $a->[ 2 ] <=> $b->[ 2 ] } @matches;

The output, sorted ascending offset within ascending letter within descending length.

Found CCCCCC, length 6 at offset 42 Found GGGG, length 4 at offset 56 Found AAA, length 3 at offset 0 Found TTT, length 3 at offset 3 Found TTT, length 3 at offset 27 Found TTT, length 3 at offset 62 Found AA, length 2 at offset 13 Found AA, length 2 at offset 48 Found GG, length 2 at offset 15 Found GG, length 2 at offset 25 Found TT, length 2 at offset 8 Found TT, length 2 at offset 11 Found TT, length 2 at offset 39 Found TT, length 2 at offset 51 Found TT, length 2 at offset 54

I hope this is helpful.

Cheers,

JohnGG