Longest possible run of a single character

srdst13 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Longest possible run of a single character by Zaxo (Archbishop) on May 22, 2006 at 21:26 UTC
It's simpler than that, just use regex greediness. You were very close. `my $re = qr/((.)\2+)/;` [download] Here's an example, `$ perl -e'my $re = qr/((.)\2+)/; $_="aabccccdddeffff"; while (m/$re/g) + { printf "\"%s\" x %d\n", $2, length($1) }' "a" x 2 "c" x 4 "d" x 3 "f" x 4 $` [download] That skips capturing lone characters as a sequence of one - change the '+' quantifier to '' to get them, too. There is no practical limit on the length of the match. I didn't address picking out the maximum length substring captured. There are lots of ways to do that. Update:* Ok, here's an easy way to get the max length as the search is done, using the (?{}) construct. `use re 'eval'; my $re = qr/((.)\2+)/; my ($maxlen, $maxchr, $maxloc); $_="aabccccdddeffff"; 1 while m/$re(?{ $maxlen = length($^N), $maxchr = substr($^N,0,1), $maxloc = pos() - $maxlen if length($^N) > $maxlen; })/g; print $maxchr, ' x ', $maxlen, ' at ', $maxloc, $/; __END__ c x 4 at 3` [download] Access to the original matching chunk of the string is given by `substr($_, $maxloc, $maxlen)`. After Compline, Zaxo	[reply] [d/l] [select]
Re^2: Longest possible run of a single character by ioannis (Abbot) on May 23, 2006 at 09:06 UTC
A slight improvement. It is better to set the pattern to *$re = qr/((.)(?:\2))/** . This will allow matching strings with single chars, strings like $_ = 'abc' .	[reply]
Re^2: Longest possible run of a single character by srdst13 (Pilgrim) on May 22, 2006 at 21:38 UTC
PERFECT! I knew there was a better mousetrap!	[reply]
Re: Longest possible run of a single character by GrandFather (Saint) on May 22, 2006 at 22:47 UTC
If you are dealing with large strings then the following may be a little quicker than the regex based techniques posted so far: use warnings; use strict; my $str; my $len = 0; while ($len < 1000000) { my $runLen = int rand (50); $str .= chr (ord ('a') + int rand (26)) x $runLen; $len += $runLen; } my $sstr = substr ($str, 0, 1) . (substr ($str, 1) ^ $str); my @bestRuns; my $match = "\0"; my $bestRunLen = 2; my $scan = 0; while (-1 != (my $start = index $sstr, $match, $scan)) { my $runLen = length ((substr ($sstr, $start) =~ /(\0+)/)[0]) + 1; if ($runLen > $bestRunLen) { # new best match @bestRuns = $start - 1; $bestRunLen = $runLen; $match = "\0" x ($bestRunLen - 1); } else { # another best match push @bestRuns, $start - 1; } $scan = $start + $bestRunLen - 1; } for (@bestRuns) { print "Run of " . substr ($str, $_, 1) . " from $_ for $bestRunLen +\n"; } [download] Prints: `Run of r from 766269 for 144` [download] Note that this finds all the matches and their start indexes. DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re: Longest possible run of a single character by BrowserUk (Patriarch) on May 22, 2006 at 21:41 UTC
Sorting isn't a particularly efficient way of finding the maximum length, but unless your strings are huge, it probably won't matter too much. `print +( sort{ length $b <=> length $a } $s =~ m[((.)\2+)]g )[ 0 ];;` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: Longest possible run of a single character by TedPride (Priest) on May 23, 2006 at 05:31 UTC
A linear solution, which will work fast for any size string: `use strict; use warnings; my ($c, $maxn, $n, $maxc, $str) = ('', 0); $str = join '', <DATA>; for (0..(length($str)-1)) { $_ = substr($str, $_, 1); if ($_ ne $c) { $n = 1; $c = $_; } else { $n++; if ($n > $maxn) { $maxn = $n; $maxc = $c; } } } print $maxc x $maxn; __DATA__ ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABACC` [download]	[reply] [d/l]
Re^2: Longest possible run of a single character by BrowserUk (Patriarch) on May 23, 2006 at 10:17 UTC
Never under estimate the performance of the regex engine. #! perl -slw use strict; use List::Util qw[ reduce ]; use Benchmark qw[ cmpthese ]; our $str; sub TedP { my ($c, $maxn, $n, $maxc) = ('', 0); for (0..(length($str)-1)) { $_ = substr($str, $_, 1); if ($_ ne $c) { $n = 1; $c = $_; } else { $n++; if ($n > $maxn) { $maxn = $n; $maxc = $c; } } } return $maxc x $maxn; } sub regex { return reduce{ length $a > length $b ? $a : $b } $str =~ m[((.)\2+)]g; } for my $n ( 1 .. 6 ) { $str = join'', map{chr(65+rand(26)) x int(rand 20) } 1 .. 10**$n; print "\nString length ", length $str; # print regex(); # print TedP(); cmpthese -1, { TedP=> \&TedP, Regex=> \&regex }; } __END__ C:\test>551038 String length 89 Rate TedP Regex TedP 12429/s -- -50% Regex 24837/s 100% -- String length 939 Rate TedP Regex TedP 1265/s -- -51% Regex 2598/s 105% -- String length 9741 Rate TedP Regex TedP 126/s -- -48% Regex 242/s 92% -- String length 94791 Rate TedP Regex TedP 12.6/s -- -43% Regex 22.3/s 77% -- String length 949396 (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate TedP Regex TedP 1.29/s -- -40% Regex 2.16/s 67% -- String length 9496562 (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter TedP Regex TedP 7.72 -- -39% Regex 4.74 63% -- [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: Longest possible run of a single character by GrandFather (Saint) on May 23, 2006 at 10:36 UTC
Time to haul out cmpthese: Read more... Benchmark code (2 kB) Results (using various values for the run length generator): Index: Run from 701117 for 30 Linear: Run from 701117 for 30 RegexSort: Run from -1 for 30 (warning: too few iterations for a reliable count) s/iter RegexSort Linear Index RegexSort 2.21 -- -54% -97% Linear 1.03 116% -- -94% Index 6.16e-002 3494% 1564% -- Index: Run from 670331 for 125 Linear: Run from 670331 for 125 RegexSort: Run from -1 for 125 Rate Linear RegexSort Index Linear 1.06/s -- -51% -90% RegexSort 2.14/s 102% -- -79% Index 10.2/s 865% 377% -- Index: Run from 749633 for 459 Linear: Run from 749633 for 459 RegexSort: Run from -1 for 459 Rate Linear RegexSort Index Linear 1.05/s -- -77% -82% RegexSort 4.56/s 334% -- -21% Index 5.77/s 450% 27% -- [download] Note that the first three lines of each group are the check results. RegexSort doesn't generate a start index for the match so -1 is shown. However the same length is generated in each case so it is presumed that the same longest match is being found. DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re: Longest possible run of a single character by GrandFather (Saint) on May 22, 2006 at 21:28 UTC
Are you looking for the longest run of a specific single character (longeset run of the letter 'a' for example), or the longest run of any character? DWIM is Perl's answer to Gödel	[reply]
Re: Longest possible run of a single character by choroba (Cardinal) on Sep 24, 2015 at 10:26 UTC
TIMTOWTDI. You can also get the positions of the "borders", i.e. places where the character sequences change: `#!/usr/bin/perl use warnings; use strict; use feature qw{ say }; my $s = 'ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABAC'; my $p = 0; my $max = 0; while ($s =~ /(?<=(.))(?!\1)/g) { my $l = pos($s) - $p; $max = $l if $l > $max; $p = pos $s; } say $max;` [download] لսႽ� ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^2: Longest possible run of a single character by Anonymous Monk on Sep 24, 2015 at 21:46 UTC
EXTREME TIMTOWTDI :) `#!/usr/bin/perl -l # http://perlmonks.org/?node_id=551038 use strict; use warnings; $_ = 'ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABACC'; print "longest run: ", length eval "'".s/(.)\K\B(?!\1)/'\|'/gr."'";` [download]	[reply] [d/l]
Re^3: Longest possible run of a single character by Anonymous Monk on Sep 24, 2015 at 22:09 UTC
EXTREME TIMTOWTDI the other way around :) `#!/usr/bin/perl -l # http://perlmonks.org/?node_id=551038 use strict; use warnings; $_ = 'ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABACC'; print "longest run: ", length eval "'".s/(.)\1*\K/'\|'/gr."'";` [download]	[reply] [d/l]
Re: Longest possible run of a single character by Anonymous Monk on Sep 24, 2015 at 10:12 UTC
Auto-adjust as you go... `#!/usr/bin/perl -l # http://perlmonks.org/?node_id=551038 use strict; use warnings; $_ = 'ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABACC'; my $n = 0; $n = length $1 while /((.)\2{$n,})/g; print "longest run: $n"` [download] hehehe	[reply] [d/l]
Re: Longest possible run of a single character by Anonymous Monk on Sep 24, 2015 at 09:12 UTC
For repeats of any length: my ($count, $word) = &longest_repeat ($str, 4); sub longest_repeat () { my ($seq, $k) = @_; my $max_word = ""; my $max_count = 0; for (my $i = 0; $i < length ($seq) - $k; ++$i) { my $word = substr ($seq, $i, $k); my $count = 0; while (substr ($seq, $i + ($count * $k), $k) eq $word) { ++$count; } if ($count > $max_count) { $max_count = $count; $max_word = $word; } } return ($max_count, $max_word); }	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.


Perl: the Markov chain saw
	PerlMonks