in reply to Re^3: Efficient regex matching with qr//; Can I do better?
in thread Efficient regex matching with qr//; Can I do better?

To illustrate the speed difference I wrote a small benchmark that assembles 500 random strings of length 5 to 15 into a regex, and matches that against a random string with 1 million characters. Here's the result:
# perl 5.10.0: timethis 10: 0 wallclock secs ( 0.33 usr + 0.00 sys = 0.33 CPU) @ 3 +0.30/s (n=10) (warning: too few iterations for a reliable count) # perl 5.8.8: timethis 10: 79 wallclock secs (78.92 usr + 0.04 sys = 78.96 CPU) @ +0.13/s (n=10)

This is on Linux, but I guess the results on Windows are similar. You see that in this case perl5.10.0 takes less than a second, while perl5.8.8 takes over a minute.

I also tried it with a shorter target (100 chars) and more iterations, and the speed differences are similar.

And here's the Benchmark:

use strict; use warnings; my @alphabet = ('a'..'z', 'A'..'Z', ' '); sub random_string { my $length = int shift; return join '', map { @alphabet[int rand(@alphabet)] } 1..$length; } my $re = join '|', map { random_string(5+rand(10)) } 1..500; my $target = random_string(1e2); use Benchmark qw(timethis); timethis(100_000, sub { $target =~ m/$re/ })

Replies are listed 'Best First'.
Re^5: Efficient regex matching with qr//; Can I do better? (Benchmark)
by ysth (Canon) on Jul 11, 2008 at 20:22 UTC
    Unfortunately, there's a limit to how much data can appear in alterations and still be optimized into a trie in 5.10.0. I think you are within that limit, but kruppy will not be. Try your benchmark again with 10000 strings.
      Well if it doesn't do it automatically, you can ask it to do it explicitly. The code I wrote for RE (tilly) 4: SAS log scanner should still work in Perl 5.10.