Re: japhy blabs about regexes (again)

I'm using perl 5.6.1, and I don't get those results. Here's the script I'm using. I was just curious about leading spaces by itself, so I threw that in, but the WHILE_RE version is definitely slower than either the SEXEGER or the SINGLE_RE version. And on the leading & trailing white space stripping, the LEADTRAIL is version is quicker than the LTSAVE version.

#!/usr/local/bin/perl -w

use strict;
use Benchmark;

my $str = "   a b  c   d    ";

timethese(-5, {
 LEADING=>\&leading,
 LEADTRAIL=>\&lead_trail,
 LTSAVE=>\&lt_save,
 SEXEGER=>\&sexeger,
 SINGLE_RE=>\&single_re,
 WHILE_RE=>\&while_sub,
});

sub leading {
 local $_ = $str;
 s/^\s+//;
 $_;
}

sub lead_trail {
 local $_ = $str;
 s/^\s+|\s+$//g;
 $_;
}

sub lt_save {
 local $_ = $str;
 s/^\s*(.*?)\s*$/$1/;
 $_;
}

sub sexeger {
 local $_ = reverse $str;
 s/^\s+//;
 reverse $str;
}

sub single_re {
 local $_ = $str;
 s/\s+$//;
 $_;
} 

sub while_sub {
 local $_ = $str;
 1 while s/\s$//;
 $_;
} 

~/tst >./tst3
Benchmark: running LEADING, LEADTRAIL, LTSAVE, SEXEGER, SINGLE_RE, WHI
+LE_RE, eac
h for at least 5 CPU seconds...
   LEADING:  5 wallclock secs ( 5.00 usr +  0.00 sys =  5.00 CPU) @ 77
+596.80/s (
n=387984)
 LEADTRAIL:  5 wallclock secs ( 5.22 usr +  0.00 sys =  5.22 CPU) @ 26
+504.21/s (
n=138352)
    LTSAVE:  5 wallclock secs ( 5.18 usr +  0.00 sys =  5.18 CPU) @ 18
+690.54/s (
n=96817)
   SEXEGER:  4 wallclock secs ( 5.35 usr +  0.00 sys =  5.35 CPU) @ 57
+972.52/s (
n=310153)
 SINGLE_RE:  5 wallclock secs ( 5.23 usr +  0.00 sys =  5.23 CPU) @ 56
+434.42/s (
n=295152)
  WHILE_RE:  6 wallclock secs ( 5.00 usr +  0.00 sys =  5.00 CPU) @ 36
+090.00/s (
n=180450)
[download]

Update: I cut 'n pasted the code and output, then fixed LTSAVE, then forgot to repaste. I guess no one looked closely enough to notice that LT_SAVE was actually beating LEADTRAIL. Its all fixed now though :)

Comment on Re: japhy blabs about regexes (again) Download Code

Replies are listed 'Best First'.
Re: Re: japhy blabs about regexes (again) by japhy (Canon) on Jul 16, 2001 at 23:07 UTC
Here's my output (from bleadperl): Benchmark: running F_plus, F_sexeger, F_while, P_plus, P_sexeger, P_wh +ile, each for at least 5 CPU seconds... F_plus: 6 wallclock secs ( 5.38 usr + 0.02 sys = 5.40 CPU) @ 38 +010.19/s (n=205255) F_sexeger: 6 wallclock secs ( 5.19 usr + 0.00 sys = 5.19 CPU) @ 82 +085.16/s (n=426022) F_while: 5 wallclock secs ( 5.23 usr + 0.00 sys = 5.23 CPU) @ 99 +934.23/s (n=522656) P_plus: 6 wallclock secs ( 5.22 usr + 0.00 sys = 5.22 CPU) @ 34 +659.58/s (n=180923) P_sexeger: 7 wallclock secs ( 5.11 usr + 0.00 sys = 5.11 CPU) @ 54 +039.53/s (n=276142) P_while: 6 wallclock secs ( 5.14 usr + 0.00 sys = 5.14 CPU) @ 58 +260.31/s (n=299458) Rate P_plus F_plus P_sexeger P_while F_sexeger F_whi +le P_plus 34660/s -- -9% -36% -41% -58% -6 +5% F_plus 38010/s 10% -- -30% -35% -54% -6 +2% P_sexeger 54040/s 56% 42% -- -7% -34% -4 +6% P_while 58260/s 68% 53% 8% -- -29% -4 +2% F_sexeger 82085/s 137% 116% 52% 41% -- -1 +8% F_while 99934/s 188% 163% 85% 72% 22% +-- [download] The F stands for "fail", and the P stands for "pass". For me, the while-approach fails AND succeeds faster than the sexeger- and plus-approaches, and sexeger fails AND succeeds faster than the plus-approach. And here's the code I ran. `#!/usr/bin/perl use Benchmark 'cmpthese'; my $X = "a b c d e f g h i j k l "; my $Y = "a b c d e f g h i j k l"; cmpthese(-5, { P_while => sub { my $x = $X; 1 while $x =~ s/\s$//; }, P_plus => sub { my $x = $X; $x =~ s/\s+$//; }, P_sexeger => sub { my $x = reverse $X; $x =~ s/^\s+//; $x = reverse $x; }, F_while => sub { my $x = $Y; 1 while $x =~ s/\s$//; }, F_plus => sub { my $x = $Y; $x =~ s/\s+$//; }, F_sexeger => sub { my $x = reverse $Y; $x =~ s/^\s+//; $x = reverse $x; }, });` [download] _____________________________________________________ Jeff `japhy` Pinyan: Perl, regex, and perl hacker. `s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;`	[reply] [d/l] [select]
Re: Re: Re: japhy blabs about regexes (again) by runrig (Abbot) on Jul 16, 2001 at 23:28 UTC
That explains it. If you add a few spaces to the end of your 'passing' string, then P_while will come in last. I suppose that's because of the cost in executing the regex a few more times. So if you expect its likely that there's a few spaces to truncate, better not to use the while :) But its still probably a good case for optimizing regexes anchored at the end of a string.	[reply]