Matching a pattern across a 32767-length boundary is probably not the best solution. It feels like looking looking for nails because you have a hammer. Here's a solution that avoids quantifiers. I'm not going to take the time to test the speed of this algorithm. If it's not fast enough, drop into Inline::C and prefer a non-regex means of detecting boundaries.
use strict;
use warnings;
for (1 .. 25) {
my $digits = join '', map {int rand 2} 1 .. 100_000;
my ($pos, $len) = longest($digits);
my $substr = substr($digits, $pos, $len);
printf "%02d: At position (%6d), length (%6d) => $substr\n", $_, $
+pos, $len;
}
sub longest {
my ($seq, $lastpos, $maxpos, $maxlen) = (shift, 0, 0, 0);
while ($seq =~ m/(.)(?!\1)/g) {
my $p = pos $seq;
my $len = $p - $lastpos;
$lastpos = $p;
if ($len > $maxlen) {
$maxlen = $len;
$maxpos = $p-$len;
}
}
return $maxpos, $maxlen;
}
__END__
01: At position ( 54044), length ( 18) => 111111111111111111
02: At position ( 22821), length ( 15) => 000000000000000
03: At position ( 84563), length ( 18) => 111111111111111111
04: At position ( 97707), length ( 18) => 000000000000000000
05: At position ( 2567), length ( 16) => 1111111111111111
06: At position ( 31038), length ( 18) => 111111111111111111
07: At position ( 73339), length ( 16) => 0000000000000000
08: At position ( 26644), length ( 19) => 0000000000000000000
09: At position ( 9906), length ( 21) => 111111111111111111111
10: At position ( 50662), length ( 15) => 111111111111111
11: At position ( 86843), length ( 15) => 111111111111111
12: At position ( 15995), length ( 16) => 0000000000000000
13: At position ( 4399), length ( 15) => 000000000000000
14: At position ( 25401), length ( 19) => 0000000000000000000
15: At position ( 65784), length ( 21) => 000000000000000000000
16: At position ( 37043), length ( 14) => 00000000000000
17: At position ( 63608), length ( 18) => 111111111111111111
18: At position ( 69870), length ( 17) => 00000000000000000
19: At position ( 20108), length ( 18) => 000000000000000000
20: At position ( 33099), length ( 19) => 1111111111111111111
21: At position ( 40355), length ( 17) => 00000000000000000
22: At position ( 98429), length ( 18) => 000000000000000000
23: At position ( 43568), length ( 16) => 1111111111111111
24: At position ( 71050), length ( 16) => 0000000000000000
25: At position ( 45697), length ( 19) => 0000000000000000000
This scales up to any string size and any greatest-common-substring size. In other words, '1' x (100000) is not a problem. By doing away with the use of the regex altogether, and by eliminating the use of substr, it could even be adapted to work with infinite streams of digits.
|