in reply to Re: Parsing 12GB Entourage database in pieces...
in thread Parsing 12GB Entourage database in pieces...

I don't know how regular expressions perform with such a huge string.

Some uses of "*" are equivalent to "{0,32767}", so you might have problems.

>perl -Mre=debug -we"qr/^(.)(\1*)\z/" ... 9: CURLYX[1] {0,32767}(14) ...

Be sure to prevent backtracking using (?>...) or (in 5.10.0+) the possessive quantifier.

Update: "/\0\0MSrc.{16}((?>[^\0]*))(?=\0)/s" looks safe.

Replies are listed 'Best First'.
Re^3: Parsing 12GB Entourage database in pieces...
by betterworld (Curate) on Aug 28, 2008 at 20:40 UTC

    Interesting...

    perl5.8.8 -wle '$s = "x" x 40_000; $s =~ /^(.)(\1*)/ and print length $2' # (segfaults) perl5.10.0 -wle '$s = "x" x 40_000; $s =~ /^(.)(\1*)/ and print length $2' Complex regular subexpression recursion limit (32766) exceeded at -e l +ine 1. 32767

    Well, at least it seems to warn when this limitation affects the result...

      I believe p5p is working on a patch to change that warning to a die.