Re^7: Regex match at the beginning or end of string

Okay. So the optimiser favours the lookaheads by a factor of 16x:

#! perl -slw
use strict;
use Benchmark qw[ cmpthese ];
use List::Util qw[ shuffle ];

our @terms = qw[
    the quick brown fox jumps over the lazy dog
];

our $re = join'', map "(?=^.*$_)", @terms;
$re = qr/$re/;

our @lines = map join( ' ', shuffle @terms), 1 .. 100;
push @lines, ( 'every good boy deserves food' ) x 100;

our( $a, $b ) = (0) x 2;
cmpthese -1, {
    a=>q[ /$re/ and ++$a for @lines; ],
    b=>q[ for my $str ( @lines ) { !grep( $str !~ /$_/, @terms) and ++
+$b; } ],
};

print "$a:$b";

__END__
C:\test>junk48
    Rate     b     a
b 81.5/s    --  -94%
a 1399/s 1616%    --
208200:12000
[download]

Or, if I don't pre-compile the regex, 30x faster:

#! perl -slw
use strict;
use Benchmark qw[ cmpthese ];
use List::Util qw[ shuffle ];

our @terms = qw[
    the quick brown fox jumps over the lazy dog
];

our $re = join'', map "(?=^.*$_)", @terms;
#$re = qr/$re/;

our @lines = map join( ' ', shuffle @terms), 1 .. 100;
push @lines, ( 'every good boy deserves food' ) x 100;

our( $a, $b ) = (0) x 2;
cmpthese -1, {
    a=>q[ /$re/ && ++$a for @lines; ],
    b=>q[ for my $str ( @lines ) { !grep( $str !~ /$_/, @terms) && ++$
+b; } ],
};

print "$a:$b";

__END__
C:\test>junk48
    Rate     b     a
b 82.7/s    --  -97%
a 2632/s 3082%    --
389700:12000
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^7: Regex match at the beginning or end of string Select or Download Code

Replies are listed 'Best First'.
Re^8: Regex match at the beginning or end of string by JavaFan (Canon) on Feb 20, 2011 at 15:46 UTC
It's faster, but not because the pattern itself is faster. You're getting the benefit because your regexp isn't changing, and you're doing the same pattern tens of thousands of times. The loop I suggested doesn't have that benefit. But it can. use strict; use warnings; use 5.010; use Benchmark qw[ cmpthese ]; use List::Util qw[ shuffle ]; our @terms = qw[ the quick brown fox jumps over the lazy dog ]; our $re = join'', map "(?=^.*$_)", @terms; $re = qr/$re/; our @lines = map join( ' ', shuffle @terms), 1 .. 100; push @lines, ( 'every good boy deserves food' ) x 100; my $line = join '&&', map {"/$_/"} @terms; our( $a, $b, $c ) = (0) x 3; cmpthese -1, { a=>q[ /$re/ and ++$a for @lines; ], b=>q[ for my $str ( @lines ) { !grep( $str !~ /$_/, @terms) and ++ +$b; } ], c=>qq [$line && ++\$c for \@lines;], }; say "$a:$b:$c"; __END__ Rate b a c b 39.8/s -- -95% -97% a 807/s 1927% -- -36% c 1254/s 3051% 55% -- 109400:4800:243000 [download]	[reply] [d/l]
Re^9: Regex match at the beginning or end of string by BrowserUk (Patriarch) on Feb 20, 2011 at 18:45 UTC
Ah well. If we're allowed to cheat then the difference moves into the realms of run to run variability :) use strict; use warnings; use 5.010; use Benchmark qw[ cmpthese ]; use List::Util qw[ shuffle ]; our @terms = qw[ the quick brown fox jumps over the lazy dog ]; our $re = join'', map "(?=^.*$_)", @terms; #$re = qr/$re/; our @lines = map join( ' ', shuffle @terms), 1 .. 100; push @lines, ( 'every good boy deserves food' ) x 100; my $line = join '&&', map {"/$_/"} @terms; our( $a, $b, $c, $d) = (0) x 4; cmpthese -1, { a=>q[ /$re/ and ++$a for @lines; ], d=>qq[ /$re/ and ++\$d for \@lines; ], b=>q[ for my $str ( @lines ) { !grep( $str !~ /$_/, @terms) and ++ +$b; } ], c=>qq [$line && ++\$c for \@lines;], }; say "$a:$b:$c:$d"; __END__ C:\test>junk48 Rate b a d c b 82.7/s -- -97% -97% -99% a 2657/s 3112% -- -3% -56% d 2748/s 3222% 3% -- -54% c 5982/s 7132% 125% 118% -- 388200:12000:829800:389700 [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^10: Regex match at the beginning or end of string by JavaFan (Canon) on Feb 20, 2011 at 22:07 UTC
If we're allowed to cheat then the difference moves into the realms of run to run variability `Rate b a d c b 82.7/s -- -97% -97% -99% a 2657/s 3112% -- -3% -56% d 2748/s 3222% 3% -- -54% c 5982/s 7132% 125% 118% -- 388200:12000:829800:389700` [download] I'm not sure what cheating is in this context, but that's not the point. The only "run to run" variability I see is between cases a and d, which are almost identical. Case c is more than twice as fast of any of the other variants.	[reply] [d/l]