Matching a pattern across a 32767-length boundary is probably not the best solution. It feels like looking looking for nails because you have a hammer. Here's a solution that avoids quantifiers. I'm not going to take the time to test the speed of this algorithm. If it's not fast enough, drop into Inline::C and prefer a non-regex means of detecting boundaries.
use strict; use warnings; for (1 .. 25) { my $digits = join '', map {int rand 2} 1 .. 100_000; my ($pos, $len) = longest($digits); my $substr = substr($digits, $pos, $len); printf "%02d: At position (%6d), length (%6d) => $substr\n", $_, $ +pos, $len; } sub longest { my ($seq, $lastpos, $maxpos, $maxlen) = (shift, 0, 0, 0); while ($seq =~ m/(.)(?!\1)/g) { my $p = pos $seq; my $len = $p - $lastpos; $lastpos = $p; if ($len > $maxlen) { $maxlen = $len; $maxpos = $p-$len; } } return $maxpos, $maxlen; } __END__ 01: At position ( 54044), length ( 18) => 111111111111111111 02: At position ( 22821), length ( 15) => 000000000000000 03: At position ( 84563), length ( 18) => 111111111111111111 04: At position ( 97707), length ( 18) => 000000000000000000 05: At position ( 2567), length ( 16) => 1111111111111111 06: At position ( 31038), length ( 18) => 111111111111111111 07: At position ( 73339), length ( 16) => 0000000000000000 08: At position ( 26644), length ( 19) => 0000000000000000000 09: At position ( 9906), length ( 21) => 111111111111111111111 10: At position ( 50662), length ( 15) => 111111111111111 11: At position ( 86843), length ( 15) => 111111111111111 12: At position ( 15995), length ( 16) => 0000000000000000 13: At position ( 4399), length ( 15) => 000000000000000 14: At position ( 25401), length ( 19) => 0000000000000000000 15: At position ( 65784), length ( 21) => 000000000000000000000 16: At position ( 37043), length ( 14) => 00000000000000 17: At position ( 63608), length ( 18) => 111111111111111111 18: At position ( 69870), length ( 17) => 00000000000000000 19: At position ( 20108), length ( 18) => 000000000000000000 20: At position ( 33099), length ( 19) => 1111111111111111111 21: At position ( 40355), length ( 17) => 00000000000000000 22: At position ( 98429), length ( 18) => 000000000000000000 23: At position ( 43568), length ( 16) => 1111111111111111 24: At position ( 71050), length ( 16) => 0000000000000000 25: At position ( 45697), length ( 19) => 0000000000000000000
This scales up to any string size and any greatest-common-substring size. In other words, '1' x (100000) is not a problem. By doing away with the use of the regex altogether, and by eliminating the use of substr, it could even be adapted to work with infinite streams of digits.
Dave
In reply to Re^5: How to match more than 32766 times in regex?
by davido
in thread How to match more than 32766 times in regex?
by rsFalse
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |