I got this case of an unefficient regexp handling when matching strings in a large file:
To look for ONE string takes 2 seconds
while looking for TWO strings takes 79 seconds. Here is the code:
The result says:use strict; use Benchmark; my $file = shift || 'no_file'; timethese( 1, { 'one_string' => sub { one_string() }, 'two_string' => sub { two_string() }, } ); sub one_string { my $filter = '00901808'; my $re = qr/$filter/o; my @matched; open (my $FH, "<$file"); while (my $rec = <$FH>) { if ( $rec =~ $re) { push @matched, $rec; } } close $FH; } sub two_string { my $filter = '00901808|87654321'; my $re = qr/$filter/o; my @matched; open (my $FH, "<$file"); while (my $rec = <$FH>) { if ( $rec =~ $re) { push @matched, $rec; } } close $FH; } __END__
# perl bench_regexp 100000lines.92MB.file Benchmark: timing 1 iterations of one_string, two_string... one_string: 2 wallclock secs ( 1.68 usr + 0.42 sys = 2.10 CPU) @ 0 +.48/s (n=1) (warning: too few iterations for a reliable count) two_string: 77 wallclock secs (76.13 usr + 0.59 sys = 76.72 CPU) @ 0 +.01/s (n=1) (warning: too few iterations for a reliable count)
In reply to Unefficient Regexp 'Matching this or that' by pelagic
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |