Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Longest possible run of a single character

by srdst13 (Pilgrim)
on May 22, 2006 at 20:55 UTC ( [id://551038]=perlquestion: print w/replies, xml ) Need Help??

srdst13 has asked for the wisdom of the Perl Monks concerning the following question:

This is a very simple problem to which there appears to be more than one answer, potentially. I am looking for the longest substring composed of a single repeating character. I can find all substrings having a repeated character longer than size "n" using a regex like:

my $re = '(.)\1' . "{$n,})"

but I can't seem to figure out how to find the longest stretch. My brute-force solution would be to loop over reasonable values of "n" and check them all. This is probably adequate for my needs where 1<n<20 or so. I'm just curious what other solutions folks might have, given that I am looking for repeats of a single character.

Thanks,
Sean

Replies are listed 'Best First'.
Re: Longest possible run of a single character
by Zaxo (Archbishop) on May 22, 2006 at 21:26 UTC

    It's simpler than that, just use regex greediness. You were very close.

    my $re = qr/((.)\2+)/;
    Here's an example,
    $ perl -e'my $re = qr/((.)\2+)/; $_="aabccccdddeffff"; while (m/$re/g) + { printf "\"%s\" x %d\n", $2, length($1) }' "a" x 2 "c" x 4 "d" x 3 "f" x 4 $
    That skips capturing lone characters as a sequence of one - change the '+' quantifier to '*' to get them, too. There is no practical limit on the length of the match.

    I didn't address picking out the maximum length substring captured. There are lots of ways to do that.

    Update: Ok, here's an easy way to get the max length as the search is done, using the (?{}) construct.

    use re 'eval'; my $re = qr/((.)\2+)/; my ($maxlen, $maxchr, $maxloc); $_="aabccccdddeffff"; 1 while m/$re(?{ $maxlen = length($^N), $maxchr = substr($^N,0,1), $maxloc = pos() - $maxlen if length($^N) > $maxlen; })/g; print $maxchr, ' x ', $maxlen, ' at ', $maxloc, $/; __END__ c x 4 at 3
    Access to the original matching chunk of the string is given by substr($_, $maxloc, $maxlen).

    After Compline,
    Zaxo

      A slight improvement. It is better to set the pattern to $re = qr/((.)(?:\2)*)/ . This will allow matching strings with single chars, strings like $_ = 'abc' .
      PERFECT! I knew there was a better mousetrap!
Re: Longest possible run of a single character
by GrandFather (Saint) on May 22, 2006 at 22:47 UTC

    If you are dealing with large strings then the following may be a little quicker than the regex based techniques posted so far:

    use warnings; use strict; my $str; my $len = 0; while ($len < 1000000) { my $runLen = int rand (50); $str .= chr (ord ('a') + int rand (26)) x $runLen; $len += $runLen; } my $sstr = substr ($str, 0, 1) . (substr ($str, 1) ^ $str); my @bestRuns; my $match = "\0"; my $bestRunLen = 2; my $scan = 0; while (-1 != (my $start = index $sstr, $match, $scan)) { my $runLen = length ((substr ($sstr, $start) =~ /(\0+)/)[0]) + 1; if ($runLen > $bestRunLen) { # new best match @bestRuns = $start - 1; $bestRunLen = $runLen; $match = "\0" x ($bestRunLen - 1); } else { # another best match push @bestRuns, $start - 1; } $scan = $start + $bestRunLen - 1; } for (@bestRuns) { print "Run of " . substr ($str, $_, 1) . " from $_ for $bestRunLen +\n"; }

    Prints:

    Run of r from 766269 for 144

    Note that this finds all the matches and their start indexes.


    DWIM is Perl's answer to Gödel
Re: Longest possible run of a single character
by BrowserUk (Patriarch) on May 22, 2006 at 21:41 UTC

    Sorting isn't a particularly efficient way of finding the maximum length, but unless your strings are huge, it probably won't matter too much.

    print +( sort{ length $b <=> length $a } $s =~ m[((.)\2+)]g )[ 0 ];;

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Longest possible run of a single character
by TedPride (Priest) on May 23, 2006 at 05:31 UTC
    A linear solution, which will work fast for any size string:
    use strict; use warnings; my ($c, $maxn, $n, $maxc, $str) = ('', 0); $str = join '', <DATA>; for (0..(length($str)-1)) { $_ = substr($str, $_, 1); if ($_ ne $c) { $n = 1; $c = $_; } else { $n++; if ($n > $maxn) { $maxn = $n; $maxc = $c; } } } print $maxc x $maxn; __DATA__ ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABACC

      Never under estimate the performance of the regex engine.

      #! perl -slw use strict; use List::Util qw[ reduce ]; use Benchmark qw[ cmpthese ]; our $str; sub TedP { my ($c, $maxn, $n, $maxc) = ('', 0); for (0..(length($str)-1)) { $_ = substr($str, $_, 1); if ($_ ne $c) { $n = 1; $c = $_; } else { $n++; if ($n > $maxn) { $maxn = $n; $maxc = $c; } } } return $maxc x $maxn; } sub regex { return reduce{ length $a > length $b ? $a : $b } $str =~ m[((.)\2+)]g; } for my $n ( 1 .. 6 ) { $str = join'', map{chr(65+rand(26)) x int(rand 20) } 1 .. 10**$n; print "\nString length ", length $str; # print regex(); # print TedP(); cmpthese -1, { TedP=> \&TedP, Regex=> \&regex }; } __END__ C:\test>551038 String length 89 Rate TedP Regex TedP 12429/s -- -50% Regex 24837/s 100% -- String length 939 Rate TedP Regex TedP 1265/s -- -51% Regex 2598/s 105% -- String length 9741 Rate TedP Regex TedP 126/s -- -48% Regex 242/s 92% -- String length 94791 Rate TedP Regex TedP 12.6/s -- -43% Regex 22.3/s 77% -- String length 949396 (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate TedP Regex TedP 1.29/s -- -40% Regex 2.16/s 67% -- String length 9496562 (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter TedP Regex TedP 7.72 -- -39% Regex 4.74 63% --

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Longest possible run of a single character
by GrandFather (Saint) on May 23, 2006 at 10:36 UTC

    Time to haul out cmpthese:

    Results (using various values for the run length generator):

    Index: Run from 701117 for 30 Linear: Run from 701117 for 30 RegexSort: Run from -1 for 30 (warning: too few iterations for a reliable count) s/iter RegexSort Linear Index RegexSort 2.21 -- -54% -97% Linear 1.03 116% -- -94% Index 6.16e-002 3494% 1564% -- Index: Run from 670331 for 125 Linear: Run from 670331 for 125 RegexSort: Run from -1 for 125 Rate Linear RegexSort Index Linear 1.06/s -- -51% -90% RegexSort 2.14/s 102% -- -79% Index 10.2/s 865% 377% -- Index: Run from 749633 for 459 Linear: Run from 749633 for 459 RegexSort: Run from -1 for 459 Rate Linear RegexSort Index Linear 1.05/s -- -77% -82% RegexSort 4.56/s 334% -- -21% Index 5.77/s 450% 27% --

    Note that the first three lines of each group are the check results. RegexSort doesn't generate a start index for the match so -1 is shown. However the same length is generated in each case so it is presumed that the same longest match is being found.


    DWIM is Perl's answer to Gödel
Re: Longest possible run of a single character
by GrandFather (Saint) on May 22, 2006 at 21:28 UTC

    Are you looking for the longest run of a specific single character (longeset run of the letter 'a' for example), or the longest run of any character?


    DWIM is Perl's answer to Gödel
Re: Longest possible run of a single character
by choroba (Cardinal) on Sep 24, 2015 at 10:26 UTC
    TIMTOWTDI. You can also get the positions of the "borders", i.e. places where the character sequences change:
    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; my $s = 'ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABAC'; my $p = 0; my $max = 0; while ($s =~ /(?<=(.))(?!\1)/g) { my $l = pos($s) - $p; $max = $l if $l > $max; $p = pos $s; } say $max;
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      EXTREME TIMTOWTDI :)

      #!/usr/bin/perl -l # http://perlmonks.org/?node_id=551038 use strict; use warnings; $_ = 'ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABACC'; print "longest run: ", length eval "'".s/(.)\K\B(?!\1)/'|'/gr."'";

        EXTREME TIMTOWTDI the other way around :)

        #!/usr/bin/perl -l # http://perlmonks.org/?node_id=551038 use strict; use warnings; $_ = 'ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABACC'; print "longest run: ", length eval "'".s/(.)\1*\K/'|'/gr."'";
Re: Longest possible run of a single character
by Anonymous Monk on Sep 24, 2015 at 10:12 UTC

    Auto-adjust as you go...

    #!/usr/bin/perl -l # http://perlmonks.org/?node_id=551038 use strict; use warnings; $_ = 'ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABACC'; my $n = 0; $n = length $1 while /((.)\2{$n,})/g; print "longest run: $n"

    hehehe

Re: Longest possible run of a single character
by Anonymous Monk on Sep 24, 2015 at 09:12 UTC
    For repeats of any length:
    my ($count, $word) = &longest_repeat ($str, 4);
    
    sub longest_repeat () {
            my ($seq, $k) = @_;
            my $max_word = "";
            my $max_count = 0;
            for (my $i = 0; $i < length ($seq) - $k; ++$i) {
                    my $word = substr ($seq, $i, $k);
                    my $count = 0;
                    while (substr ($seq, $i + ($count * $k), $k) eq $word) {
                            ++$count; }
                    if ($count > $max_count) {
                            $max_count = $count;
                            $max_word = $word; } }
            return ($max_count, $max_word); }
    
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://551038]
Approved by Joost
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (6)
As of 2024-04-18 19:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found