in reply to problem with long strings and regex

An example of where the "Substitution loop" error is emitted from pp_hot.c:

I32 maxiters; ... strend = s + len; slen = RX_MATCH_UTF8(rx) ? utf8_length((U8*)s, (U8*)strend) : len; maxiters = 2 * slen + 10; /* We can match twice at each position, once with zero-length, second time with non-zero. */ ... if (iters++ > maxiters) DIE(aTHX_ "Substitution loop");

So, maxiters is a 32bit signed integer, set to twice the length of the string (which will be more than the number of characters if there are UTF8 characters in the string) plus a few.

The maximum value of an I32 is +2,147,483,647, so when your string gets to a little over 1GB in length, the maxiter variable wraps and you get nonsense.

Replies are listed 'Best First'.
Re^2: problem with long strings and regex
by flummoxer (Initiate) on Apr 02, 2009 at 07:44 UTC
    Wow what a great answer! Knowing there's a limit is good, knowing where I might tinker with perl source is great.