in reply to japhy blabs about regexes (again)

I'm using perl 5.6.1, and I don't get those results. Here's the script I'm using. I was just curious about leading spaces by itself, so I threw that in, but the WHILE_RE version is definitely slower than either the SEXEGER or the SINGLE_RE version. And on the leading & trailing white space stripping, the LEADTRAIL is version is quicker than the LTSAVE version.
#!/usr/local/bin/perl -w use strict; use Benchmark; my $str = " a b c d "; timethese(-5, { LEADING=>\&leading, LEADTRAIL=>\&lead_trail, LTSAVE=>\&lt_save, SEXEGER=>\&sexeger, SINGLE_RE=>\&single_re, WHILE_RE=>\&while_sub, }); sub leading { local $_ = $str; s/^\s+//; $_; } sub lead_trail { local $_ = $str; s/^\s+|\s+$//g; $_; } sub lt_save { local $_ = $str; s/^\s*(.*?)\s*$/$1/; $_; } sub sexeger { local $_ = reverse $str; s/^\s+//; reverse $str; } sub single_re { local $_ = $str; s/\s+$//; $_; } sub while_sub { local $_ = $str; 1 while s/\s$//; $_; } ~/tst >./tst3 Benchmark: running LEADING, LEADTRAIL, LTSAVE, SEXEGER, SINGLE_RE, WHI +LE_RE, eac h for at least 5 CPU seconds... LEADING: 5 wallclock secs ( 5.00 usr + 0.00 sys = 5.00 CPU) @ 77 +596.80/s ( n=387984) LEADTRAIL: 5 wallclock secs ( 5.22 usr + 0.00 sys = 5.22 CPU) @ 26 +504.21/s ( n=138352) LTSAVE: 5 wallclock secs ( 5.18 usr + 0.00 sys = 5.18 CPU) @ 18 +690.54/s ( n=96817) SEXEGER: 4 wallclock secs ( 5.35 usr + 0.00 sys = 5.35 CPU) @ 57 +972.52/s ( n=310153) SINGLE_RE: 5 wallclock secs ( 5.23 usr + 0.00 sys = 5.23 CPU) @ 56 +434.42/s ( n=295152) WHILE_RE: 6 wallclock secs ( 5.00 usr + 0.00 sys = 5.00 CPU) @ 36 +090.00/s ( n=180450)
Update: I cut 'n pasted the code and output, then fixed LTSAVE, then forgot to repaste. I guess no one looked closely enough to notice that LT_SAVE was actually beating LEADTRAIL. Its all fixed now though :)

Replies are listed 'Best First'.
Re: Re: japhy blabs about regexes (again)
by japhy (Canon) on Jul 16, 2001 at 23:07 UTC
    Here's my output (from bleadperl):
    Benchmark: running F_plus, F_sexeger, F_while, P_plus, P_sexeger, P_wh +ile, each for at least 5 CPU seconds... F_plus: 6 wallclock secs ( 5.38 usr + 0.02 sys = 5.40 CPU) @ 38 +010.19/s (n=205255) F_sexeger: 6 wallclock secs ( 5.19 usr + 0.00 sys = 5.19 CPU) @ 82 +085.16/s (n=426022) F_while: 5 wallclock secs ( 5.23 usr + 0.00 sys = 5.23 CPU) @ 99 +934.23/s (n=522656) P_plus: 6 wallclock secs ( 5.22 usr + 0.00 sys = 5.22 CPU) @ 34 +659.58/s (n=180923) P_sexeger: 7 wallclock secs ( 5.11 usr + 0.00 sys = 5.11 CPU) @ 54 +039.53/s (n=276142) P_while: 6 wallclock secs ( 5.14 usr + 0.00 sys = 5.14 CPU) @ 58 +260.31/s (n=299458) Rate P_plus F_plus P_sexeger P_while F_sexeger F_whi +le P_plus 34660/s -- -9% -36% -41% -58% -6 +5% F_plus 38010/s 10% -- -30% -35% -54% -6 +2% P_sexeger 54040/s 56% 42% -- -7% -34% -4 +6% P_while 58260/s 68% 53% 8% -- -29% -4 +2% F_sexeger 82085/s 137% 116% 52% 41% -- -1 +8% F_while 99934/s 188% 163% 85% 72% 22% +--
    The F stands for "fail", and the P stands for "pass". For me, the while-approach fails AND succeeds faster than the sexeger- and plus-approaches, and sexeger fails AND succeeds faster than the plus-approach.

    And here's the code I ran.

    #!/usr/bin/perl use Benchmark 'cmpthese'; my $X = "a b c d e f g h i j k l "; my $Y = "a b c d e f g h i j k l"; cmpthese(-5, { P_while => sub { my $x = $X; 1 while $x =~ s/\s$//; }, P_plus => sub { my $x = $X; $x =~ s/\s+$//; }, P_sexeger => sub { my $x = reverse $X; $x =~ s/^\s+//; $x = reverse $x; }, F_while => sub { my $x = $Y; 1 while $x =~ s/\s$//; }, F_plus => sub { my $x = $Y; $x =~ s/\s+$//; }, F_sexeger => sub { my $x = reverse $Y; $x =~ s/^\s+//; $x = reverse $x; }, });

    _____________________________________________________
    Jeff japhy Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      That explains it. If you add a few spaces to the end of your 'passing' string, then P_while will come in last. I suppose that's because of the cost in executing the regex a few more times. So if you expect its likely that there's a few spaces to truncate, better not to use the while :)

      But its still probably a good case for optimizing regexes anchored at the end of a string.