in reply to Re: Regex failure interpretation
in thread Regex failure interpretation

I'm probably doing something dumb, but I can't reproduce that?

use Benchmark qw[ cmpthese ]; @good = map{ join'',map{ rand() < .5 ? 1 : 0 } 1..1000 } 1..100; @bad = map{ substr( $s=$_, rand( 1000 ), 1 ) = 2 } @good; cmpthese( -3, { '+ve' => q[ $n1 = grep /^[01]+$/, @good, @bad ], '-ve' => q[ $n2 = grep !/[^01]/, @good, @bad ] }); Rate -ve +ve -ve 467/s -- -80% +ve 2363/s 406% -- print $n1, $n2; 100 100

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

Replies are listed 'Best First'.
Re^3: Regex failure interpretation
by Aristotle (Chancellor) on Mar 20, 2004 at 12:13 UTC
    D'oh. Figures that the complex expression would trigger an optimization:
    $ perl -Mre=debug -e'/[^01]/, /^[01]+$/' Freeing REx: `","' Compiling REx `[^01]' size 12 Got 100 bytes for offset annotations. first at 1 1: ANYOF[\0-/2-\377{unicode_all}](12) 12: END(0) stclass `ANYOF[\0-/2-\377{unicode_all}]' minlen 1 Offsets: [12] 1[5] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 6[0] Compiling REx `^[01]+$' size 15 Got 124 bytes for offset annotations. first at 2 synthetic stclass `ANYOF[01]'. 1: BOL(2) 2: PLUS(14) 3: ANYOF[01](0) 14: EOL(15) 15: END(0) floating `'$ at 1..2147483647 (checking floating) stclass `ANYOF[01]' +anchored(BOL) minlen 1 Offsets: [15] 1[1] 6[4] 2[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[ +0] 7[1] 8[0] Freeing REx: `"[^01]"' Freeing REx: `"^[01]+$"'
    On the other hand when using one of the builtins like \w, it still manages to beat the complex expression by a small margin:
    #!/usr/bin/perl use strict; use warnings; use Benchmark qw( cmpthese ); my $len = shift; my @abc = 'a' .. 'z'; my @good = ('') x 100; for(@good) { $_ .= $abc[ rand @abc ] for 1 .. $len } my @bad = @good; substr $_, rand( length ), 1, ' ' for @bad; my %bench = ( excl => sub { grep /\A\w+\z/, @good, @bad }, incl => sub { grep !/\W/, @good, @bad }, ); print map "$_ matches: ".$bench{$_}->()."\n", keys %bench; cmpthese -3 => \%bench; __END__ $ perl t.pl 100 excl matches: 100 incl matches: 100 Benchmark: running excl, incl for at least 3 CPU seconds... excl: 3 wallclock secs ( 3.20 usr + 0.00 sys = 3.20 CPU) @ 68 +26.88/s (n=21846) incl: 3 wallclock secs ( 3.20 usr + 0.00 sys = 3.20 CPU) @ 68 +72.50/s (n=21992) Rate excl incl excl 6827/s -- -1% incl 6872/s 1% -- $ perl t.pl 1000 excl matches: 100 incl matches: 100 Benchmark: running excl, incl for at least 3 CPU seconds... excl: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 14 +31.65/s (n=4524) incl: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 12 +55.38/s (n=3967) Rate incl excl incl 1255/s -- -12% excl 1432/s 14% -- $ perl t.pl 10000 excl matches: 100 incl matches: 100 Benchmark: running excl, incl for at least 3 CPU seconds... excl: 4 wallclock secs ( 3.21 usr + 0.00 sys = 3.21 CPU) @ 15 +8.26/s (n=508) incl: 4 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 13 +5.76/s (n=429) Rate incl excl incl 136/s -- -14% excl 158/s 17% -- $ perl -Mre=debug -e'/\W/, /^\w+$/' Freeing REx: `","' Compiling REx `\W' size 2 Got 20 bytes for offset annotations. first at 1 1: NALNUM(2) 2: END(0) stclass `NALNUM' minlen 1 Offsets: [2] 1[2] 3[0] Compiling REx `^\w+$' size 5 Got 44 bytes for offset annotations. first at 2 synthetic stclass `ANYOF[0-9A-Z_a-z{unicode_all}]'. 1: BOL(2) 2: PLUS(4) 3: ALNUM(0) 4: EOL(5) 5: END(0) floating `'$ at 1..2147483647 (checking floating) stclass `ANYOF[0-9A- +Z_a-z{unicode_all}]' anchored(BOL) minlen 1 Offsets: [5] 1[1] 4[2] 2[2] 5[1] 6[0] Freeing REx: `"\\W"' Freeing REx: `"^\\w+$"'

    Makeshifts last the longest.