the case where regex seems to work slower

rsFalse has asked for the wisdom of the Perl Monks concerning the following question:

Hello.

Couple of days ago I was solving this stringy problem from codeforces contest -> http://codeforces.com/contest/832/problem/B.

After I solved it using regexes, I wanted to find others who approached same way, and only I've found was solution in Ruby. And the maximum time for any test case with mine solution was about 3x slower than with contestants' who used Ruby.
Time-limit: 2000 ms; Memory Limit: 256 MB;
Here are two links for solutions:
Ruby - http://codeforces.com/contest/832/submission/28852992
Perl - http://codeforces.com/contest/832/submission/28885266
I can see 94 test cases, but sadly - not whole but only pieces of them.
TC #84:
Perl: Time: 997 ms, memory: 3180 KB
Ruby: Time: 77 ms, memory: 12304 KB
TC #93:
Perl: Time: 171 ms, memory: 21796 KB
Ruby: Time: 311 ms, memory: 42948 KB
(about 15-50 ms are used anytime solving any TC)

Both solutions seem similar. Except of chomping or not chomping input lines (which also can influence speed). Both solutions compile final regex before use it over next n (1 ≤ n ≤ 10e5) query strings. During contest I haven't compiled regex and got time-limit-exceeded >2000 ms comparing to 93 ms with compiled (qr//) regex (at TC #47).

I can't see whole TC #84, but logically it contains 4 lines:

b
*aaaaa...
1
...aaaaa...
[download]

where triple points means sequence of some symbols.
Here I hardcoded one of possible variants which takes some hundreds ms to complete:

#!/usr/bin/perl

use warnings;
use strict;

$\ = $/;

    my $dict = 'b';
    $_ = '*' . 'a' x (9e4 - 1e4) . "\n";
    
    s/\?/[$dict]/g;
    s/\*/[^$dict]*/;
    
    my $qr = qr/^$_$/;
    
    print $_ =~ $qr ? "YES" : "NO" for 'a' x (0 + 1e4) . 'a' x (9e4 - 
+1e4) . "\n";
[download]

So what is the worst test case for mine solution? Is there a better solution using regex? Why is such difference between Perl and Ruby at 84th TC?

Upd. Codeforces is using Perl 5.20.1, -> problemset -> custom-test.
To find other solutions of this problem, you go to problem -> status, and setup "status filter".

Comment on the case where regex seems to work slower Select or Download Code

Replies are listed 'Best First'.
Re: the case where regex seems to work slower by Corion (Patriarch) on Jul 26, 2017 at 09:32 UTC
Most likely, the difference comes when one regex engine is backtracking to ultimately fail. I'm not sure what the exact problem description is, but maybe you can speed things up by making some match fails not greedy or atomic (or both). Maybe the following fails faster: `s/\/[^$dict]?/g` [download] Or maybe the following replacement makes things fail faster, as it prevents backtracking over that group if it ever failed.: `s/\/(?>[^$dict])/g` [download] (As a note, your second replacement has no /g flag while the Ruby code uses `.gsub` in both cases)	[reply] [d/l] [select]
Re: the case where regex seems to work slower by choroba (Cardinal) on Jul 26, 2017 at 11:28 UTC
With some inputs I created, avoiding the negated class sped the script about 2×, but it didn't improve your hardcoded testcase: `while (my $dict = <>) { chomp $dict; my $nodict = join q(), 'a' .. 'z'; $nodict =~ s/[$dict]//g; $_ = <>; s/\?/[$dict]/g; s/\/[$nodict]/;` [download] ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l] [select]
Re: the case where regex seems to work slower by tybalt89 (Monsignor) on Jul 26, 2017 at 23:03 UTC
Just fudge things a little :) `#!/usr/bin/perl # http://perlmonks.org/?node_id=1196089 use strict; use warnings; $\ = $/; my $dict = 'b'; $_ = '' . 'a' x (9e4 - 1e4) . "\n"; my($head, $tail) = map length( $_ // '' ), split /\/, $_; s/\?/[$dict]/g; s/\/ [^$dict ] /; my $qr = qr/^$_$/; substr($_, $head, 0, ' '), substr($_, -$tail, 0, ' '), print $_ =~ $qr ? "YES" : "NO" for 'a' x (0 + 1e4) . 'a' x (9e4 - 1e4) . "\n";` [download] This avoids the massive backtrack this particular problem encounters.	[reply] [d/l]
Re^2: the case where regex seems to work slower by rsFalse (Chaplain) on Jul 27, 2017 at 10:53 UTC
I think this do not work at cases when '*' is absent or not in the middle! Read more... (4 kB)	[reply] [d/l] [select]
Re^3: the case where regex seems to work slower by tybalt89 (Monsignor) on Jul 27, 2017 at 11:40 UTC
Hey, it worked for the test case you gave :) (grumble, grumble, inadequate test cases :( Even these test cases are inadequate, can you figure out why? ) #!/usr/bin/perl # http://perlmonks.org/?node_id=1196089 use strict; use warnings; $\ = $/; while(<DATA>) { chomp; my $dict = $_; $_ = <DATA>; my($head, $tail) = map length( $_ // '' ), split /\/; s/\?/[$dict]/g; s/\/ [^$dict ]* /; my $qr = qr/^$_/; $_ = <DATA>, $head <= length($_) && substr($_, $head, 0, ' '), $tail && $tail <= length($_) && substr($_, -$tail, 0, ' '), print $_ =~ $qr ? "YES" : "NO" for 1 .. <DATA>; } __DATA__ ab a?a 3 aaa aab aaac ab a?a 3 aaa aab caaa ab a?a 3 aaa aab aaac b aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa* 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa* 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa [download]	[reply] [d/l]
Re^4: the case where regex seems to work slower by rsFalse (Chaplain) on Jul 27, 2017 at 12:57 UTC
Re^5: the case where regex seems to work slower by tybalt89 (Monsignor) on Jul 27, 2017 at 13:37 UTC
Re: the case where regex seems to work slower by Anonymous Monk on Jul 26, 2017 at 13:02 UTC
Wonder why does Ruby take 4 times as much memory in one case, and twice as much in the other? We know many things do come down to "space versus speed." Could have a lot to do with it, but how?	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.