in reply to Re: the case where regex seems to work slower
in thread the case where regex seems to work slower

I think this do not work at cases when '*' is absent or not in the middle!


If '*' is in the middle of string, it seems OK:
#!/usr/bin/perl # http://perlmonks.org/?node_id=1196089 use warnings; use strict; $\ = $/; while(<DATA>){ chomp; my $dict = $_; $_ = <DATA>; my( $head, $tail ) = map length( $_ // '' ), split /\*/, $_; print "head:[$head], tail:[$tail]"; s/\?/[$dict]/g; s/\*/[^$dict]*/; my $qr = qr/^$_$/; print "regex:[$qr]"; (print "loop #:[$_]"), $_ = <DATA>, (print "[$_]"), substr($_, $head, 0, ' '), (print "[$_]"), substr($_, -$tail, 0, ' '), (print "[$_]"), print "Answer: ", $_ =~ $qr ? "YES" : "NO" for 1 .. <DATA>; } __DATA__ ab a?*a 2 aacca aaccb
OUTPUT:
head:[2], tail:[2] regex:[(?^:^a[ab][^ab]*a $)] loop #:[1] [aacca ] [aa cca ] [aa cc a ] Answer: YES loop #:[2] [aaccb ] [aa ccb ] [aa cc b ] Answer: NO
If star is absent, output should contain "YES" and then "NO". Another __DATA__ for same code:
__DATA__ ab a?a 2 aaa aab
OUTPUT
head:[4], tail:[] regex:[(?^:^a[ab]a $)] loop #:[1] [aaa ] [aaa ] [ aaa ] Answer: NO loop #:[2] [aab ] [aab ] [ aab ] Answer: NO


But I understood idea, and after modified a bit. Added line #13, and changed lines #25-26, got OK:
#!/usr/bin/perl # http://perlmonks.org/?node_id=1196089 use warnings; use strict; $\ = $/; while(<DATA>){ chomp; my $dict = $_; $_ = <DATA>; my $star = /\*/; my( $head, $tail ) = map length( $_ // '' ), split /\*/, $_; print "head:[$head], tail:[$tail]"; s/\?/[$dict]/g; s/\*/[^$dict]*/; my $qr = qr/^$_$/; print "regex:[$qr]"; (print "loop #:[$_]"), $_ = <DATA>, (print "[$_]"), $star && substr($_, $head, 0, ' '), (print "[$_]"), $star && substr($_, -$tail, 0, ' '), (print "[$_]"), print "Answer: ", $_ =~ $qr ? "YES" : "NO" for 1 .. <DATA>; } __DATA__ ab a?a 2 aaa aab
OUTPUT:
head:[4], tail:[] regex:[(?^:^a[ab]a $)] loop #:[1] [aaa ] [aaa ] [aaa ] Answer: YES loop #:[2] [aab ] [aab ] [aab ] Answer: NO
But when trying to submit to codeforces, got Run-time error, seems because of star is at the end,
or at beginning of a string (http://codeforces.com/contest/832/submission/28928022 , TC #3).

Replies are listed 'Best First'.
Re^3: the case where regex seems to work slower
by tybalt89 (Monsignor) on Jul 27, 2017 at 11:40 UTC

    Hey, it worked for the test case you gave :)

    (grumble, grumble, inadequate test cases :( Even these test cases are inadequate, can you figure out why? )

    #!/usr/bin/perl # http://perlmonks.org/?node_id=1196089 use strict; use warnings; $\ = $/; while(<DATA>) { chomp; my $dict = $_; $_ = <DATA>; my($head, $tail) = map length( $_ // '' ), split /\*/; s/\?/[$dict]/g; s/\*/ [^$dict ]* /; my $qr = qr/^$_/; $_ = <DATA>, $head <= length($_) && substr($_, $head, 0, ' '), $tail && $tail <= length($_) && substr($_, -$tail, 0, ' '), print $_ =~ $qr ? "YES" : "NO" for 1 .. <DATA>; } __DATA__ ab a?a 3 aaa aab aaac ab *a?a 3 aaa aab caaa ab a?a* 3 aaa aab aaac b aaaaaaaaaaaaaaaaaaaaaaa*aaaaaaaaaaaaaaaaaa 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b aaaaaaaaaaaaaaaaaaaaaaa*aaaaaaaaaaaaaaaaaa 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b *aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b *aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa* 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa* 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
      Sorry, I didn't understand your second sentence, because parentheses were not balanced ;)

      Thank you for improvement to cope with other cases. I've submitted, and it took only 171 ms at worst TC, this is #93 - http://codeforces.com/contest/832/submission/28930515

      Upd.: In this solution there single spaces in regex are used. They were mysterious for me in previous code, and should be taken by negate character class.

        The spaces are there to split the target string into three parts, before the '*', the star part, and after the star. This prevents trying to match with a pattern equivalent to /a*a*/ which causes the long matching time you were seeing;