in reply to Re: how to speed up pattern match between two files
in thread how to speed up pattern match between two files

In a quick test, index doesn't seem to be quicker than simple regex. example:

$ perl -MTime::HiRes -MBenchmark=timethese -le'open F,"</usr/share/dic +t/words";@words=<F>;chomp for @words; $re= q(^bonjour); timethese(-1, + { index => sub { for (@words) { return 1 if index($_,"bonjour") != - +1 } }, re => sub { for (@words) { return 1 if /\bbonjour\b/ } }, q(re +^) => sub { for (@words) { return 1 if /^bonjour/ } }, re_comp => sub + { $re= qr/^bonjour/o; for (@words) { return 1 if /$re/o } }, grep => + sub { return 1 if grep /bonjour/, @words } } )' Benchmark: running grep, index, re, re^, re_comp for at least 1 CPU seconds ... grep: 2 wallclock secs ( 1.08 usr + 0.00 sys = 1.08 CPU) @ 32 +.41/s (n=35) index: 1 wallclock secs ( 1.12 usr + 0.02 sys = 1.14 CPU) @ 26 +7.54/s (n=305) re: 1 wallclock secs ( 1.10 usr + 0.02 sys = 1.12 CPU) @ 27 +2.32/s (n=305) re^: 1 wallclock secs ( 1.14 usr + 0.00 sys = 1.14 CPU) @ 32 +7.19/s (n=373) re_comp: 1 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 26 +8.27/s (n=279)
In this test anchoring the regex with '^' boost a little (+20%), and index or compiling the regex doesn't help. As you said, profiling the code can help here.

Replies are listed 'Best First'.
Re^3: how to speed up pattern match between two files
by RichardK (Parson) on Sep 17, 2014 at 09:15 UTC

    The line lengths in the words list are short so any differences in performance will be lost in the system noise. So your test can't tell us anything useful and isn't a great match for the OPs problem.

      I've added 3 columns of 3 dictionaries. I've put some random chars at the end to get lines length the same as his files. And, again, short regex (he uses short regex in this part of the code) was as fast as index (and even slightly faster). I don't know why, but the best he can do, it's to try with his own data and his own perl (I've used perl 5.20.0)