in reply to Regex combining /(foo|bar)/ slower than using foreach (/foo/,/bar/) ???  

Using a /(foo|bar)/ regex on strings is slower than using a foreach loop doing the matching one after another.
Not in my benchmark:
use strict; use warnings; use Benchmark 'cmpthese'; undef $/; my $str = <DATA>; cmpthese(-2, { alternation => sub { my @m = $str =~ /(foo|bar)/g }, map => sub { my @m = map $str =~ /$_/g, qw(foo bar) }, loop => sub { my @m; push @m, $str =~ /$_/g for qw(foo bar) } +, }); cmpthese(-2, { alt_s => sub { my $s2 = $str; $s2 =~ s/(foo|bar)//g }, loop_s => sub { my $s2 = $str; $s2 =~ s/$_//g for qw(foo bar) } }); __DATA__ This string contains foo and bar for fools and bards and I pity the foo who bars the way
Rate map loop alternation map 14977/s -- -6% -25% loop 15986/s 7% -- -20% alternation 20090/s 34% 26% -- Rate loop_s alt_s loop_s 17436/s -- -37% alt_s 27643/s 59% --
Update:
But I converted your example program to use Benchmark, and I do get better results with for:
use strict; use Benchmark 'cmpthese'; foreach my $regexcount (10) { foreach my $regexlength (5,20) { my @items = map{ createRandomTextWithLength($regexlength) } + (1..$regexcount); my $regexstr = join('|',@items); my $regex = qr/(?:$regexstr)/; foreach my $stringlength (1000,100000) { print join "\n", "Stringlength: $stringlength", "Number of Regexes:$regexcount", "Length of each Regex +:$regexlength\n"; my $teststring = createRandomTextWithLength($stringlength) +; cmpthese(-2, { alt => sub { my $test=$teststring; $test =~ s/$regex/foobar/g; }, for => sub { my $test=$teststring; foreach my $oneregex (@items) { $test =~ s/$oneregex/foobar/g; } } }); } } } sub createRandomTextWithLength($) { my($count) = (@_); my $string; for (1.. $count) { $string.=chr(ord('a')+rand(20)) } return $string; }
Results:
Stringlength: 1000 Number of Regexes:10 Length of each Regex:5 Rate alt for alt 1296/s -- -70% for 4357/s 236% -- Stringlength: 100000 Number of Regexes:10 Length of each Regex:5 Rate alt for alt 12.7/s -- -97% for 379/s 2877% -- Stringlength: 1000 Number of Regexes:10 Length of each Regex:20 Rate alt for alt 1324/s -- -68% for 4144/s 213% -- Stringlength: 100000 Number of Regexes:10 Length of each Regex:20 Rate alt for alt 12.7/s -- -98% for 676/s 5209% --

Caution: Contents may have been coded under pressure.

Replies are listed 'Best First'.
Re^2: Regex combining /(foo|bar)/ slower than using foreach (/foo/,/bar/) ???
by hardburn (Abbot) on Feb 18, 2005 at 16:03 UTC

    The for and map implementations have to recompile the regex on each iteration, but the alternation doesn't have that limitation. Here's a modified version that fixes that:

    use strict; use warnings; use Benchmark 'cmpthese'; undef $/; my $str = <DATA>; my @regexen = ( qr/foo/, qr/bar/ ); cmpthese(-2, { alternation => sub { my @m = $str =~ /(foo|bar)/g }, map => sub { my @m = map $str =~ /$_/g, @regexen }, loop => sub { my @m; push @m, $str =~ /$_/g for @regexe +n }, }); cmpthese(-2, { alt_s => sub { my $s2 = $str; $s2 =~ s/(foo|bar)//g }, loop_s => sub { my $s2 = $str; $s2 =~ s/$_//g for @regexen} }); __DATA__ This string contains foo and bar for fools and bards and I pity the foo who bars the way

    I got results more consistant with the OP:

    Rate alternation map loop alternation 26962/s -- -61% -64% map 68918/s 156% -- -9% loop 75551/s 180% 10% -- Rate alt_s loop_s alt_s 32434/s -- -20% loop_s 40707/s 26% --

    Just for kicks, I added a study $str; right after reading the DATA filehandle. I know study is rarely useful, but it's supposed to help in situations where a single string is going to be matched against many different regexen, which is roughly what we have here. Results:

    Rate alternation loop map alternation 26569/s -- -56% -60% loop 60015/s 126% -- -10% map 66863/s 152% 11% -- Rate alt_s loop_s alt_s 31355/s -- -19% loop_s 38857/s 24% --

    The speed drops across the board, but alternation is only affected by 1.5% (which is insignificant noise). But loop gets hit hard.

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

      A problem with all of these benchmarks is that the different choices don't actually do the same things. See my response below.