Re^2: how to speed up pattern match between two files

In a quick test, index doesn't seem to be quicker than simple regex. example:

$ perl -MTime::HiRes -MBenchmark=timethese -le'open F,"</usr/share/dic
+t/words";@words=<F>;chomp for @words; $re= q(^bonjour); timethese(-1,
+ { index => sub { for (@words) { return 1 if index($_,"bonjour") != -
+1 } }, re => sub { for (@words) { return 1 if /\bbonjour\b/ } }, q(re
+^) => sub { for (@words) { return 1 if /^bonjour/ } }, re_comp => sub
+ { $re= qr/^bonjour/o; for (@words) { return 1 if /$re/o } }, grep =>
+ sub { return 1 if grep /bonjour/, @words } } )'
Benchmark: 
running
 grep, index, re, re^, re_comp
 for at least 1 CPU seconds
...

      grep:  2 wallclock secs ( 1.08 usr +  0.00 sys =  1.08 CPU) @ 32
+.41/s (n=35)

     index:  1 wallclock secs ( 1.12 usr +  0.02 sys =  1.14 CPU) @ 26
+7.54/s (n=305)

        re:  1 wallclock secs ( 1.10 usr +  0.02 sys =  1.12 CPU) @ 27
+2.32/s (n=305)

       re^:  1 wallclock secs ( 1.14 usr +  0.00 sys =  1.14 CPU) @ 32
+7.19/s (n=373)

   re_comp:  1 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 26
+8.27/s (n=279)
[download]

In this test anchoring the regex with '^' boost a little (+20%), and index or compiling the regex doesn't help. As you said, profiling the code can help here.

Comment on Re^2: how to speed up pattern match between two files Download Code

Replies are listed 'Best First'.
Re^3: how to speed up pattern match between two files by RichardK (Parson) on Sep 17, 2014 at 09:15 UTC
The line lengths in the words list are short so any differences in performance will be lost in the system noise. So your test can't tell us anything useful and isn't a great match for the OPs problem.	[reply]
Re^4: how to speed up pattern match between two files by gnujsa (Acolyte) on Sep 17, 2014 at 15:58 UTC
I've added 3 columns of 3 dictionaries. I've put some random chars at the end to get lines length the same as his files. And, again, short regex (he uses short regex in this part of the code) was as fast as index (and even slightly faster). I don't know why, but the best he can do, it's to try with his own data and his own perl (I've used perl 5.20.0)	[reply]