in reply to substrings that consist of repeating characters

A simpler variation of your code:
use strict; use warnings; my $string = "AAAAAAATTTAGTTCTTAAGGCTGACATCGGTTTACGTCAGCGTTACCCCCCAAGT +TTTTTTTTTTTTTTTTTATTGGGGACTTT"; my $len = 0; my $best = ""; while ($string =~ /((.)\2{$len,})/g) { $len = length $1; $best = $1 } print "best: $best\n"

Replies are listed 'Best First'.
Re^2: substrings that consist of repeating characters
by GrandFather (Saint) on Sep 28, 2020 at 22:17 UTC

    At risk of upsetting likbez:

    use strict; use warnings; my $string = "AAAATTTAGTTCTTAAGGCTGACATCACGTCAGCGTTACCCCCCAAGATTGGGGAC +TTT"; my $len = 0; my $best = ''; $best = $1, $len = length $1 while $string =~ /((.)\2{$len,})/g; print "best: $best ($len)\n"

    Prints:

    best: CCCCCC (6)
    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re^2: substrings that consist of repeating characters
by salva (Canon) on Sep 29, 2020 at 09:26 UTC
    Though, note that that regular expression in my comment above is pretty inefficient as it looks for the longest match at every character instead of skipping chunks of the same character once the match fails at the character starting it (the regular expression in the OP is much better in that regard).

    We can use (*SKIP) to avoid that:

    my $len = 0; my $best = ""; while ($string =~ /((.)(?:(*SKIP)\2){$len,})/g) { $len = length $1; $best = $1 } print "best: $best\n"

    But that is still not completely efficient: the regexp is recompiled at every loop iteration because of $len, so maybe the following simpler code could be faster:

    my $best = ""; while ($string =~ /((.)\2+)/g) { $best = $1 if length $1 > length $best } print "best: $best\n"

    Or maybe this more convoluted variation:

    my $best = ""; $best = $1 while $string =~ /((.)\2*)(*SKIP)(?(?{length $^N <= length +$best})(*FAIL))/g; print "best: $best\n"

      Does that work?

      Win8 Strawberry 5.30.3.1 (64) Tue 09/29/2020 13:32:10 C:\@Work\Perl\monks >perl use strict; use warnings; my $string = 'AABBBBCCC'; my $len = 0; my $best = ""; while ($string =~ /((.)(?:(*SKIP)\2){$len,})/g) { $len = length $1; $best = $1 } print "best: '$best' \n" ^Z best: ''


      Give a man a fish:  <%-{-{-{-<

        Oops, no, it doesn't, but I think it should!

        Is that a bug in perl?