Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^5: How to match more than 32766 times in regex?

by davido (Cardinal)
on Dec 01, 2015 at 21:48 UTC ( [id://1149082]=note: print w/replies, xml ) Need Help??


in reply to Re^4: How to match more than 32766 times in regex?
in thread How to match more than 32766 times in regex?

Matching a pattern across a 32767-length boundary is probably not the best solution. It feels like looking looking for nails because you have a hammer. Here's a solution that avoids quantifiers. I'm not going to take the time to test the speed of this algorithm. If it's not fast enough, drop into Inline::C and prefer a non-regex means of detecting boundaries.

use strict; use warnings; for (1 .. 25) { my $digits = join '', map {int rand 2} 1 .. 100_000; my ($pos, $len) = longest($digits); my $substr = substr($digits, $pos, $len); printf "%02d: At position (%6d), length (%6d) => $substr\n", $_, $ +pos, $len; } sub longest { my ($seq, $lastpos, $maxpos, $maxlen) = (shift, 0, 0, 0); while ($seq =~ m/(.)(?!\1)/g) { my $p = pos $seq; my $len = $p - $lastpos; $lastpos = $p; if ($len > $maxlen) { $maxlen = $len; $maxpos = $p-$len; } } return $maxpos, $maxlen; } __END__ 01: At position ( 54044), length ( 18) => 111111111111111111 02: At position ( 22821), length ( 15) => 000000000000000 03: At position ( 84563), length ( 18) => 111111111111111111 04: At position ( 97707), length ( 18) => 000000000000000000 05: At position ( 2567), length ( 16) => 1111111111111111 06: At position ( 31038), length ( 18) => 111111111111111111 07: At position ( 73339), length ( 16) => 0000000000000000 08: At position ( 26644), length ( 19) => 0000000000000000000 09: At position ( 9906), length ( 21) => 111111111111111111111 10: At position ( 50662), length ( 15) => 111111111111111 11: At position ( 86843), length ( 15) => 111111111111111 12: At position ( 15995), length ( 16) => 0000000000000000 13: At position ( 4399), length ( 15) => 000000000000000 14: At position ( 25401), length ( 19) => 0000000000000000000 15: At position ( 65784), length ( 21) => 000000000000000000000 16: At position ( 37043), length ( 14) => 00000000000000 17: At position ( 63608), length ( 18) => 111111111111111111 18: At position ( 69870), length ( 17) => 00000000000000000 19: At position ( 20108), length ( 18) => 000000000000000000 20: At position ( 33099), length ( 19) => 1111111111111111111 21: At position ( 40355), length ( 17) => 00000000000000000 22: At position ( 98429), length ( 18) => 000000000000000000 23: At position ( 43568), length ( 16) => 1111111111111111 24: At position ( 71050), length ( 16) => 0000000000000000 25: At position ( 45697), length ( 19) => 0000000000000000000

This scales up to any string size and any greatest-common-substring size. In other words, '1' x (100000) is not a problem. By doing away with the use of the regex altogether, and by eliminating the use of substr, it could even be adapted to work with infinite streams of digits.


Dave

Replies are listed 'Best First'.
Re^6: How to match more than 32766 times in regex?
by Anonymous Monk on Dec 01, 2015 at 22:38 UTC
    Hm, I don't know what it has to do with alternating substrings, but here's another version of your program :)
    use strict; use warnings; use List::Util 'reduce'; for (1 .. 25) { my $digits = join '', map {int rand 2} 1 .. 100_000; my $substr = longest($digits); printf "%02d: At position (%6d), length (%6d) => $substr\n", $_, index($digits, $substr), length($substr); } sub longest { return reduce { length($a) > length($b) ? $a : $b; } split /(?<=0)(?=1)|(?<=1)(?=0)/, $_[0]; }
    Works a bit faster, too.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1149082]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-04-19 02:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found