in reply to Re^7: How to match more than 32766 times in regex?
in thread How to match more than 32766 times in regex?

1. I can't understand how your first regex can match more than 32677 times agains string '0' x 1_000_000 ? And it gives correct answer 3. It has a '+' quantifier, which is {1,32677}, true?

2. Your solution doesn't cope with inputs, which have only one pair of consecutive chars :P . If input is '11' . '01' x N , answer is not a 2*N+1, answer is 2*N+2. (upd: Later I've read newer post with correct code).

Replies are listed 'Best First'.
Re^9: How to match more than 32766 times in regex?
by Anonymous Monk on Dec 02, 2015 at 17:39 UTC
    1. I can't understand how your first regex can match more than 32677 times agains string '0' x 1_000_000 ? And it gives correct answer 3. It has a '+' quantifier, which is {1,32677}, true?
    Its an optimization. (0)+ is equivalent to 0{1,23677}, 0+ is not. There are many places where Perl's regexes engine doesn't behave like a straighforward backtracking engine - specifically to fix problems inherent to backtracking regex engines. The restriction on the depth of recursion is one of these places too.

    You can see what the regexes compile into with:

    $ perl -Mre=debug -e 'qr/0+|1+/; Final program: 1: BRANCH (5) 2: PLUS (9) 3: EXACT <0> (0) 5: BRANCH (FAIL) 6: PLUS (9) 7: EXACT <1> (0) 9: END (0) minlen 1
    compare:
    $ perl -Mre=debug -e 'qr/(0)+|(1)+/' Compiling REx "(0)+|(1)+" Final program: 1: BRANCH (12) 2: CURLYN[1] {1,32767} (23) 4: NOTHING (6) 6: EXACT <0> (0) 10: WHILEM (0) 11: NOTHING (23) 12: BRANCH (FAIL) 13: CURLYN[2] {1,32767} (23) 15: NOTHING (17) 17: EXACT <1> (0) 21: WHILEM (0) 22: NOTHING (23) 23: END (0) minlen 1 Freeing REx: "(0)+|(1)+"
    Your solution doesn't cope with inputs
    This entire approach - using regexes for this problem - is nonsense, and the way you write code doesn't help at all.