in reply to Re^5: How to match more than 32766 times in regex?
in thread How to match more than 32766 times in regex?

$len = + (() = /(.)\1*\1*\1*\1*/g) + /(.)\1\1|(.)\2.*.*.*.*(.)\3/ + /(.)\1/
You know, that doesn't make any sense whatsoever.

Replies are listed 'Best First'.
Re^7: How to match more than 32766 times in regex?
by Anonymous Monk on Dec 02, 2015 at 03:28 UTC
    After some pondering... Is this what you tried to do:
    use strict; use warnings; my @strs = ( '010111', '0' x 1_000_000, '01' x 1_000_000, '011' x 1_000_000, ( '01' x 1_000_000 ) . '111', ); for my $str (@strs) { my $len = ( () = $str =~ m{ 0+ | 1+ }xg ) + ( $str =~ m{ 000 | 111 | (.)\1 .* (.)\2 }x ? 2 : 0 ); print $len, "\n"; }
    (Perls regex optimizer is pretty smart about the second regex, btw!)
      or, rather
      for my $str (@strs) { my $len = ( () = $str =~ m{ 0+ | 1+ }xg ) + ( $str =~ m{ 000 | 111 | (.)\1 .* (.)\2 }x ? 2 : $str =~ m{ (?: ^ (.)\1 | (.)\2 $) }x ? 1 : 0 ); print $len, "\n"; }
      ...anyway, I'm now too bored to think whether that's the correct solution and I'll leave the rest of that little programming exercise to you.

      Your problem is that you're trying to write overly compact code even though you don't know Perl very well. And, naturally, writing the retarded equivalent of JAPHs is not a good way to learn Perl - or any programming language, for that matter. Just saying.

      I recommended you to learn Forth a while ago, and I still do. BUK suggested Python, but its significantly more verbose than Perl and you won't like it.

        Mkay, I believe that's the final version:
        use strict; use warnings; use feature 'say'; use List::Util 'min'; my @strs = ( '0' x 1_000_000, '01' x 1_000_000, '011' x 1_000_000, '0110' x 1_000_000, ); for (@strs) { my $l = length(tr/01//sr); my $d = length() - $l; say $l + min( $d, 2 ); }
      1. I can't understand how your first regex can match more than 32677 times agains string '0' x 1_000_000 ? And it gives correct answer 3. It has a '+' quantifier, which is {1,32677}, true?

      2. Your solution doesn't cope with inputs, which have only one pair of consecutive chars :P . If input is '11' . '01' x N , answer is not a 2*N+1, answer is 2*N+2. (upd: Later I've read newer post with correct code).
        1. I can't understand how your first regex can match more than 32677 times agains string '0' x 1_000_000 ? And it gives correct answer 3. It has a '+' quantifier, which is {1,32677}, true?
        Its an optimization. (0)+ is equivalent to 0{1,23677}, 0+ is not. There are many places where Perl's regexes engine doesn't behave like a straighforward backtracking engine - specifically to fix problems inherent to backtracking regex engines. The restriction on the depth of recursion is one of these places too.

        You can see what the regexes compile into with:

        $ perl -Mre=debug -e 'qr/0+|1+/; Final program: 1: BRANCH (5) 2: PLUS (9) 3: EXACT <0> (0) 5: BRANCH (FAIL) 6: PLUS (9) 7: EXACT <1> (0) 9: END (0) minlen 1
        compare:
        $ perl -Mre=debug -e 'qr/(0)+|(1)+/' Compiling REx "(0)+|(1)+" Final program: 1: BRANCH (12) 2: CURLYN[1] {1,32767} (23) 4: NOTHING (6) 6: EXACT <0> (0) 10: WHILEM (0) 11: NOTHING (23) 12: BRANCH (FAIL) 13: CURLYN[2] {1,32767} (23) 15: NOTHING (17) 17: EXACT <1> (0) 21: WHILEM (0) 22: NOTHING (23) 23: END (0) minlen 1 Freeing REx: "(0)+|(1)+"
        Your solution doesn't cope with inputs
        This entire approach - using regexes for this problem - is nonsense, and the way you write code doesn't help at all.