Re: Longest possible run of a single character
by Zaxo (Archbishop) on May 22, 2006 at 21:26 UTC
|
It's simpler than that, just use regex greediness. You were very close.
my $re = qr/((.)\2+)/;
Here's an example,
$ perl -e'my $re = qr/((.)\2+)/; $_="aabccccdddeffff"; while (m/$re/g)
+ { printf "\"%s\" x %d\n", $2, length($1) }'
"a" x 2
"c" x 4
"d" x 3
"f" x 4
$
That skips capturing lone characters as a sequence of one - change the '+' quantifier to '*' to get them, too. There is no practical limit on the length of the match.
I didn't address picking out the maximum length substring captured. There are lots of ways to do that.
Update: Ok, here's an easy way to get the max length as the search is done, using the (?{}) construct.
use re 'eval';
my $re = qr/((.)\2+)/;
my ($maxlen, $maxchr, $maxloc);
$_="aabccccdddeffff";
1 while m/$re(?{
$maxlen = length($^N),
$maxchr = substr($^N,0,1),
$maxloc = pos() - $maxlen
if length($^N) > $maxlen;
})/g;
print $maxchr, ' x ', $maxlen, ' at ', $maxloc, $/;
__END__
c x 4 at 3
Access to the original matching chunk of the string is given by substr($_, $maxloc, $maxlen).
| [reply] [d/l] [select] |
|
A slight improvement. It is better to set the pattern to
$re = qr/((.)(?:\2)*)/ . This will allow
matching strings with single chars, strings like $_ = 'abc' .
| [reply] |
|
PERFECT! I knew there was a better mousetrap!
| [reply] |
Re: Longest possible run of a single character
by GrandFather (Saint) on May 22, 2006 at 22:47 UTC
|
If you are dealing with large strings then the following may be a little quicker than the regex based techniques posted so far:
use warnings;
use strict;
my $str;
my $len = 0;
while ($len < 1000000) {
my $runLen = int rand (50);
$str .= chr (ord ('a') + int rand (26)) x $runLen;
$len += $runLen;
}
my $sstr = substr ($str, 0, 1) . (substr ($str, 1) ^ $str);
my @bestRuns;
my $match = "\0";
my $bestRunLen = 2;
my $scan = 0;
while (-1 != (my $start = index $sstr, $match, $scan)) {
my $runLen = length ((substr ($sstr, $start) =~ /(\0+)/)[0]) + 1;
if ($runLen > $bestRunLen) {
# new best match
@bestRuns = $start - 1;
$bestRunLen = $runLen;
$match = "\0" x ($bestRunLen - 1);
} else {
# another best match
push @bestRuns, $start - 1;
}
$scan = $start + $bestRunLen - 1;
}
for (@bestRuns) {
print "Run of " . substr ($str, $_, 1) . " from $_ for $bestRunLen
+\n";
}
Prints:
Run of r from 766269 for 144
Note that this finds all the matches and their start indexes.
DWIM is Perl's answer to Gödel
| [reply] [d/l] [select] |
Re: Longest possible run of a single character
by BrowserUk (Patriarch) on May 22, 2006 at 21:41 UTC
|
Sorting isn't a particularly efficient way of finding the maximum length, but unless your strings are huge, it probably won't matter too much.
print +( sort{ length $b <=> length $a } $s =~ m[((.)\2+)]g )[ 0 ];;
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Re: Longest possible run of a single character
by TedPride (Priest) on May 23, 2006 at 05:31 UTC
|
A linear solution, which will work fast for any size string:
use strict;
use warnings;
my ($c, $maxn, $n, $maxc, $str) = ('', 0);
$str = join '', <DATA>;
for (0..(length($str)-1)) {
$_ = substr($str, $_, 1);
if ($_ ne $c) {
$n = 1;
$c = $_;
}
else {
$n++;
if ($n > $maxn) {
$maxn = $n;
$maxc = $c;
}
}
}
print $maxc x $maxn;
__DATA__
ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABACC
| [reply] [d/l] |
|
#! perl -slw
use strict;
use List::Util qw[ reduce ];
use Benchmark qw[ cmpthese ];
our $str;
sub TedP {
my ($c, $maxn, $n, $maxc) = ('', 0);
for (0..(length($str)-1)) {
$_ = substr($str, $_, 1);
if ($_ ne $c) {
$n = 1;
$c = $_;
}
else {
$n++;
if ($n > $maxn) {
$maxn = $n;
$maxc = $c;
}
}
}
return $maxc x $maxn;
}
sub regex {
return reduce{
length $a > length $b ? $a : $b
} $str =~ m[((.)\2+)]g;
}
for my $n ( 1 .. 6 ) {
$str = join'', map{chr(65+rand(26)) x int(rand 20) } 1 .. 10**$n;
print "\nString length ", length $str;
# print regex();
# print TedP();
cmpthese -1, { TedP=> \&TedP, Regex=> \®ex };
}
__END__
C:\test>551038
String length 89
Rate TedP Regex
TedP 12429/s -- -50%
Regex 24837/s 100% --
String length 939
Rate TedP Regex
TedP 1265/s -- -51%
Regex 2598/s 105% --
String length 9741
Rate TedP Regex
TedP 126/s -- -48%
Regex 242/s 92% --
String length 94791
Rate TedP Regex
TedP 12.6/s -- -43%
Regex 22.3/s 77% --
String length 949396
(warning: too few iterations for a reliable count)
(warning: too few iterations for a reliable count)
Rate TedP Regex
TedP 1.29/s -- -40%
Regex 2.16/s 67% --
String length 9496562
(warning: too few iterations for a reliable count)
(warning: too few iterations for a reliable count)
s/iter TedP Regex
TedP 7.72 -- -39%
Regex 4.74 63% --
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Re: Longest possible run of a single character
by GrandFather (Saint) on May 23, 2006 at 10:36 UTC
|
Time to haul out cmpthese:
Results (using various values for the run length generator):
Index: Run from 701117 for 30
Linear: Run from 701117 for 30
RegexSort: Run from -1 for 30
(warning: too few iterations for a reliable count)
s/iter RegexSort Linear Index
RegexSort 2.21 -- -54% -97%
Linear 1.03 116% -- -94%
Index 6.16e-002 3494% 1564% --
Index: Run from 670331 for 125
Linear: Run from 670331 for 125
RegexSort: Run from -1 for 125
Rate Linear RegexSort Index
Linear 1.06/s -- -51% -90%
RegexSort 2.14/s 102% -- -79%
Index 10.2/s 865% 377% --
Index: Run from 749633 for 459
Linear: Run from 749633 for 459
RegexSort: Run from -1 for 459
Rate Linear RegexSort Index
Linear 1.05/s -- -77% -82%
RegexSort 4.56/s 334% -- -21%
Index 5.77/s 450% 27% --
Note that the first three lines of each group are the check results. RegexSort doesn't generate a start index for the match so -1 is shown. However the same length is generated in each case so it is presumed that the same longest match is being found.
DWIM is Perl's answer to Gödel
| [reply] [d/l] [select] |
Re: Longest possible run of a single character
by GrandFather (Saint) on May 22, 2006 at 21:28 UTC
|
Are you looking for the longest run of a specific single character (longeset run of the letter 'a' for example), or the longest run of any character?
DWIM is Perl's answer to Gödel
| [reply] |
Re: Longest possible run of a single character
by choroba (Cardinal) on Sep 24, 2015 at 10:26 UTC
|
TIMTOWTDI. You can also get the positions of the "borders", i.e. places where the character sequences change:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $s = 'ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABAC';
my $p = 0;
my $max = 0;
while ($s =~ /(?<=(.))(?!\1)/g) {
my $l = pos($s) - $p;
$max = $l if $l > $max;
$p = pos $s;
}
say $max;
| [reply] [d/l] |
|
#!/usr/bin/perl -l
# http://perlmonks.org/?node_id=551038
use strict;
use warnings;
$_ = 'ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABACC';
print "longest run: ", length eval "'".s/(.)\K\B(?!\1)/'|'/gr."'";
| [reply] [d/l] |
|
#!/usr/bin/perl -l
# http://perlmonks.org/?node_id=551038
use strict;
use warnings;
$_ = 'ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABACC';
print "longest run: ", length eval "'".s/(.)\1*\K/'|'/gr."'";
| [reply] [d/l] |
Re: Longest possible run of a single character
by Anonymous Monk on Sep 24, 2015 at 10:12 UTC
|
#!/usr/bin/perl -l
# http://perlmonks.org/?node_id=551038
use strict;
use warnings;
$_ = 'ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABACC';
my $n = 0;
$n = length $1 while /((.)\2{$n,})/g;
print "longest run: $n"
hehehe
| [reply] [d/l] |
Re: Longest possible run of a single character
by Anonymous Monk on Sep 24, 2015 at 09:12 UTC
|
For repeats of any length:
my ($count, $word) = &longest_repeat ($str, 4);
sub longest_repeat () {
my ($seq, $k) = @_;
my $max_word = "";
my $max_count = 0;
for (my $i = 0; $i < length ($seq) - $k; ++$i) {
my $word = substr ($seq, $i, $k);
my $count = 0;
while (substr ($seq, $i + ($count * $k), $k) eq $word) {
++$count; }
if ($count > $max_count) {
$max_count = $count;
$max_word = $word; } }
return ($max_count, $max_word); }
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |