Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: substrings that consist of repeating characters

by johngg (Canon)
on Sep 28, 2020 at 11:26 UTC ( #11122295=note: print w/replies, xml ) Need Help??


in reply to substrings that consist of repeating characters

Just in case you need offsets as well, here's a solution for that.

use strict; use warnings; use feature qw{ say }; my $string = q{AAATTTAGTTCTTAAGGCTGACATCGGTTTACGTCAGCGTTACCCCCCAAGTTATTGGGGACT +TT}; my @matches; push @matches, [ length $1, $1, $-[ 0 ] ] while $string =~ m{(([ACGT])\2+)}g; say qq{Found $_->[ 1 ], length $_->[ 0 ] at offset $_->[ 2 ]} for sort { $b->[ 0 ] <=> $a->[ 0 ] || $a->[ 1 ] cmp $b->[ 1 ] || $a->[ 2 ] <=> $b->[ 2 ] } @matches;

The output, sorted ascending offset within ascending letter within descending length.

Found CCCCCC, length 6 at offset 42 Found GGGG, length 4 at offset 56 Found AAA, length 3 at offset 0 Found TTT, length 3 at offset 3 Found TTT, length 3 at offset 27 Found TTT, length 3 at offset 62 Found AA, length 2 at offset 13 Found AA, length 2 at offset 48 Found GG, length 2 at offset 15 Found GG, length 2 at offset 25 Found TT, length 2 at offset 8 Found TT, length 2 at offset 11 Found TT, length 2 at offset 39 Found TT, length 2 at offset 51 Found TT, length 2 at offset 54

I hope this is helpful.

Cheers,

JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11122295]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (1)
As of 2023-01-29 05:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?