in reply to Regex failure interpretation

If you want to assert that a string only contains a certain class of characters it's better to make sure it doesn't match the opposite; otherwise you're making the regex engine unnecessarily jump through hoops. And if you want the first character, well, just pick it out.
print substr($_, 0, 1) unless /[^01]/;
Cleaner and a lot more efficient. Bench the solutions with a few very long strings if you're curious.

Makeshifts last the longest.

Replies are listed 'Best First'.
Re: Re: Regex failure interpretation
by BrowserUk (Patriarch) on Mar 20, 2004 at 11:36 UTC

    I'm probably doing something dumb, but I can't reproduce that?

    use Benchmark qw[ cmpthese ]; @good = map{ join'',map{ rand() < .5 ? 1 : 0 } 1..1000 } 1..100; @bad = map{ substr( $s=$_, rand( 1000 ), 1 ) = 2 } @good; cmpthese( -3, { '+ve' => q[ $n1 = grep /^[01]+$/, @good, @bad ], '-ve' => q[ $n2 = grep !/[^01]/, @good, @bad ] }); Rate -ve +ve -ve 467/s -- -80% +ve 2363/s 406% -- print $n1, $n2; 100 100

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
      D'oh. Figures that the complex expression would trigger an optimization:
      $ perl -Mre=debug -e'/[^01]/, /^[01]+$/' Freeing REx: `","' Compiling REx `[^01]' size 12 Got 100 bytes for offset annotations. first at 1 1: ANYOF[\0-/2-\377{unicode_all}](12) 12: END(0) stclass `ANYOF[\0-/2-\377{unicode_all}]' minlen 1 Offsets: [12] 1[5] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 6[0] Compiling REx `^[01]+$' size 15 Got 124 bytes for offset annotations. first at 2 synthetic stclass `ANYOF[01]'. 1: BOL(2) 2: PLUS(14) 3: ANYOF[01](0) 14: EOL(15) 15: END(0) floating `'$ at 1..2147483647 (checking floating) stclass `ANYOF[01]' +anchored(BOL) minlen 1 Offsets: [15] 1[1] 6[4] 2[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[ +0] 7[1] 8[0] Freeing REx: `"[^01]"' Freeing REx: `"^[01]+$"'
      On the other hand when using one of the builtins like \w, it still manages to beat the complex expression by a small margin:
      #!/usr/bin/perl use strict; use warnings; use Benchmark qw( cmpthese ); my $len = shift; my @abc = 'a' .. 'z'; my @good = ('') x 100; for(@good) { $_ .= $abc[ rand @abc ] for 1 .. $len } my @bad = @good; substr $_, rand( length ), 1, ' ' for @bad; my %bench = ( excl => sub { grep /\A\w+\z/, @good, @bad }, incl => sub { grep !/\W/, @good, @bad }, ); print map "$_ matches: ".$bench{$_}->()."\n", keys %bench; cmpthese -3 => \%bench; __END__ $ perl t.pl 100 excl matches: 100 incl matches: 100 Benchmark: running excl, incl for at least 3 CPU seconds... excl: 3 wallclock secs ( 3.20 usr + 0.00 sys = 3.20 CPU) @ 68 +26.88/s (n=21846) incl: 3 wallclock secs ( 3.20 usr + 0.00 sys = 3.20 CPU) @ 68 +72.50/s (n=21992) Rate excl incl excl 6827/s -- -1% incl 6872/s 1% -- $ perl t.pl 1000 excl matches: 100 incl matches: 100 Benchmark: running excl, incl for at least 3 CPU seconds... excl: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 14 +31.65/s (n=4524) incl: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 12 +55.38/s (n=3967) Rate incl excl incl 1255/s -- -12% excl 1432/s 14% -- $ perl t.pl 10000 excl matches: 100 incl matches: 100 Benchmark: running excl, incl for at least 3 CPU seconds... excl: 4 wallclock secs ( 3.21 usr + 0.00 sys = 3.21 CPU) @ 15 +8.26/s (n=508) incl: 4 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 13 +5.76/s (n=429) Rate incl excl incl 136/s -- -14% excl 158/s 17% -- $ perl -Mre=debug -e'/\W/, /^\w+$/' Freeing REx: `","' Compiling REx `\W' size 2 Got 20 bytes for offset annotations. first at 1 1: NALNUM(2) 2: END(0) stclass `NALNUM' minlen 1 Offsets: [2] 1[2] 3[0] Compiling REx `^\w+$' size 5 Got 44 bytes for offset annotations. first at 2 synthetic stclass `ANYOF[0-9A-Z_a-z{unicode_all}]'. 1: BOL(2) 2: PLUS(4) 3: ALNUM(0) 4: EOL(5) 5: END(0) floating `'$ at 1..2147483647 (checking floating) stclass `ANYOF[0-9A- +Z_a-z{unicode_all}]' anchored(BOL) minlen 1 Offsets: [5] 1[1] 4[2] 2[2] 5[1] 6[0] Freeing REx: `"\\W"' Freeing REx: `"^\\w+$"'

      Makeshifts last the longest.