Re: japhy blabs about regexes (again)
by runrig (Abbot) on Jul 16, 2001 at 22:49 UTC
|
I'm using perl 5.6.1, and I don't get those results. Here's the script I'm using. I was just curious about leading spaces by itself, so I threw that in, but the WHILE_RE version is definitely slower than either the SEXEGER or the SINGLE_RE version. And on the leading & trailing white space stripping, the LEADTRAIL is version is quicker than the LTSAVE version.
#!/usr/local/bin/perl -w
use strict;
use Benchmark;
my $str = " a b c d ";
timethese(-5, {
LEADING=>\&leading,
LEADTRAIL=>\&lead_trail,
LTSAVE=>\<_save,
SEXEGER=>\&sexeger,
SINGLE_RE=>\&single_re,
WHILE_RE=>\&while_sub,
});
sub leading {
local $_ = $str;
s/^\s+//;
$_;
}
sub lead_trail {
local $_ = $str;
s/^\s+|\s+$//g;
$_;
}
sub lt_save {
local $_ = $str;
s/^\s*(.*?)\s*$/$1/;
$_;
}
sub sexeger {
local $_ = reverse $str;
s/^\s+//;
reverse $str;
}
sub single_re {
local $_ = $str;
s/\s+$//;
$_;
}
sub while_sub {
local $_ = $str;
1 while s/\s$//;
$_;
}
~/tst >./tst3
Benchmark: running LEADING, LEADTRAIL, LTSAVE, SEXEGER, SINGLE_RE, WHI
+LE_RE, eac
h for at least 5 CPU seconds...
LEADING: 5 wallclock secs ( 5.00 usr + 0.00 sys = 5.00 CPU) @ 77
+596.80/s (
n=387984)
LEADTRAIL: 5 wallclock secs ( 5.22 usr + 0.00 sys = 5.22 CPU) @ 26
+504.21/s (
n=138352)
LTSAVE: 5 wallclock secs ( 5.18 usr + 0.00 sys = 5.18 CPU) @ 18
+690.54/s (
n=96817)
SEXEGER: 4 wallclock secs ( 5.35 usr + 0.00 sys = 5.35 CPU) @ 57
+972.52/s (
n=310153)
SINGLE_RE: 5 wallclock secs ( 5.23 usr + 0.00 sys = 5.23 CPU) @ 56
+434.42/s (
n=295152)
WHILE_RE: 6 wallclock secs ( 5.00 usr + 0.00 sys = 5.00 CPU) @ 36
+090.00/s (
n=180450)
Update: I cut 'n pasted the code and output, then fixed LTSAVE, then forgot to repaste. I guess
no one looked closely enough to notice that LT_SAVE was actually beating LEADTRAIL. Its all fixed now though :)
| [reply] [d/l] |
|
Here's my output (from bleadperl):
Benchmark: running F_plus, F_sexeger, F_while, P_plus, P_sexeger, P_wh
+ile, each for at least 5 CPU seconds...
F_plus: 6 wallclock secs ( 5.38 usr + 0.02 sys = 5.40 CPU) @ 38
+010.19/s (n=205255)
F_sexeger: 6 wallclock secs ( 5.19 usr + 0.00 sys = 5.19 CPU) @ 82
+085.16/s (n=426022)
F_while: 5 wallclock secs ( 5.23 usr + 0.00 sys = 5.23 CPU) @ 99
+934.23/s (n=522656)
P_plus: 6 wallclock secs ( 5.22 usr + 0.00 sys = 5.22 CPU) @ 34
+659.58/s (n=180923)
P_sexeger: 7 wallclock secs ( 5.11 usr + 0.00 sys = 5.11 CPU) @ 54
+039.53/s (n=276142)
P_while: 6 wallclock secs ( 5.14 usr + 0.00 sys = 5.14 CPU) @ 58
+260.31/s (n=299458)
Rate P_plus F_plus P_sexeger P_while F_sexeger F_whi
+le
P_plus 34660/s -- -9% -36% -41% -58% -6
+5%
F_plus 38010/s 10% -- -30% -35% -54% -6
+2%
P_sexeger 54040/s 56% 42% -- -7% -34% -4
+6%
P_while 58260/s 68% 53% 8% -- -29% -4
+2%
F_sexeger 82085/s 137% 116% 52% 41% -- -1
+8%
F_while 99934/s 188% 163% 85% 72% 22%
+--
The F stands for "fail", and the P stands for "pass". For me, the while-approach fails AND succeeds faster than the sexeger- and plus-approaches, and sexeger fails AND succeeds faster than the plus-approach.
And here's the code I ran.
#!/usr/bin/perl
use Benchmark 'cmpthese';
my $X = "a b c d e f g h i j k l ";
my $Y = "a b c d e f g h i j k l";
cmpthese(-5, {
P_while => sub {
my $x = $X;
1 while $x =~ s/\s$//;
},
P_plus => sub {
my $x = $X;
$x =~ s/\s+$//;
},
P_sexeger => sub {
my $x = reverse $X;
$x =~ s/^\s+//;
$x = reverse $x;
},
F_while => sub {
my $x = $Y;
1 while $x =~ s/\s$//;
},
F_plus => sub {
my $x = $Y;
$x =~ s/\s+$//;
},
F_sexeger => sub {
my $x = reverse $Y;
$x =~ s/^\s+//;
$x = reverse $x;
},
});
_____________________________________________________
Jeff japhy Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; | [reply] [d/l] [select] |
|
| [reply] |
Re: japhy blabs about regexes (again)
by srawls (Friar) on Jul 16, 2001 at 22:06 UTC
|
Hmm... I find the 1 while s/\s$//; very interesting, but shocking as well. I guess I just blindly asumed that s/\s*$// was optimized.
Anyway, I, too, am working on a regex engine; mine is coded in C. It's a slow process for me, but I'm learning a lot--I haven't tried any optimizations yet, though, I'll have to finish before I start that : )
The 15 year old, freshman programmer,
Stephen Rawls | [reply] [d/l] [select] |
Re: japhy blabs about regexes (again)
by DrZaius (Monk) on Jul 16, 2001 at 22:22 UTC
|
I haven't tried any benchmarks and I am likely not to, but this little ditty comes to mind:
s!\s*\s$!!
Although, I can also see how it would be slower :) | [reply] [d/l] |
|
Oh, I was hoping someone would bite. ;)
That regex is worse. You'd think by breaking \s+ down into \s*\s, you'd get the best of both worlds -- the "1 or more" meaning, and the "trailing character" optimization. No such luck. In fact, the regex does MORE work, because of backtracking. :(
You see, Perl doesn't even see if the string has ANY trailing whitespace if you do /\s+$/. Meaning it takes a while to fail for a string like "a b c d e f g h i j k l m n o p q r s t u v w x y z". How sad.
# try this, and be sad :(
use re 'debug';
"abc def ghi" =~ /\s+$/;
"abc def ghi" =~ /\s*\s$/;
The output (if you understand it) will upset you. Well, it upset me.
_____________________________________________________
Jeff japhy Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; | [reply] [d/l] |
Re: japhy blabs about regexes (again)(ot)(boo)
by boo_radley (Parson) on Jul 16, 2001 at 23:37 UTC
|
| [reply] |
Re: japhy blabs about regexes (again)
by Wookie (Beadle) on Jul 17, 2001 at 16:47 UTC
|
Surely all this depends on how much white space you're typically expecting.
Could the following not also be more efficent depending on the data:
if ($blah=~m/^\s/) {
$blah=~s/^\s+//;
}
or indeed (thanks to MZSanford for this one):
if (index($blah," ")) == 0 {
$blah=~s/^\s+//;
}
and for trailing - use rindex :).
UPDATE: Oops - The index solution only works for *actual* spaces (thanks Hofmator).
That possibly gets you a little closer to the best of both worlds?
game(Wookie,opponent) eq 'Wookie' ? undef $problem : remove_limbs(arms,opponent); | [reply] [d/l] [select] |
|
| [reply] |
Re: japhy blabs about regexes (again)
by MZSanford (Curate) on Jul 17, 2001 at 20:43 UTC
|
Being curious as Perl people will be, i tired an rindex based solution. It unfortunatly would only work with true spaces (" "), and did not fare too well in testing, but, we all learn when things like this happen ... here is my code/output :
Code Addition :
P_rindex => sub {
my $x = $X;
while (1) { my $i = length($x); (rindex($x," ") == $i ? chop : las
+t) };
},
F_rindex => sub {
my $x = $Y;
while (1) { my $i = length($x); (rindex($x," ") == $i ? chop : las
+t) };
},
Output :
Benchmark: running F_plus, F_rindex, F_sexeger, F_while, P_plus, P_rin
+dex,P_sexeger, P_while, each for at least 5 CPU seconds...
F_plus: 6 wallclock secs ( 5.15 usr + 0.00 sys = 5.15 CPU) @ 37
+600.97/s(n=193645)
F_rindex: 6 wallclock secs ( 5.45 usr + 0.00 sys = 5.45 CPU) @ 59
+183.30/s(n=322549)
F_sexeger: 7 wallclock secs ( 5.24 usr + 0.00 sys = 5.24 CPU) @ 59
+187.60/s(n=310143)
F_while: 7 wallclock secs ( 5.29 usr + 0.00 sys = 5.29 CPU) @ 69
+741.02/s(n=368930)
P_plus: 5 wallclock secs ( 5.46 usr + 0.00 sys = 5.46 CPU) @ 33
+136.08/s(n=180923)
P_rindex: 6 wallclock secs ( 5.31 usr + 0.00 sys = 5.31 CPU) @ 60
+743.69/s(n=322549)
P_sexeger: 5 wallclock secs ( 5.36 usr + 0.00 sys = 5.36 CPU) @ 49
+557.65/s(n=265629)
P_while: 6 wallclock secs ( 5.09 usr + 0.00 sys = 5.09 CPU) @ 50
+179.17/s(n=255412)
Rate P_plus F_plus P_sexeger P_while F_rindex F_sexeger P_
+rindex F_while
P_plus 33136/s -- -12% -33% -34% -44% -44%
+ -45%-52%
F_plus 37601/s 13% -- -24% -25% -36% -36%
+ -38%-46%
P_sexeger 49558/s 50% 32% -- -1% -16% -16%
+ -18%-29%
P_while 50179/s 51% 33% 1% -- -15% -15%
+ -17%-28%
F_rindex 59183/s 79% 57% 19% 18% -- -0%
+ -3%-15%
F_sexeger 59188/s 79% 57% 19% 18% 0% --
+ -3%-15%
P_rindex 60744/s 83% 62% 23% 21% 3% 3%
+ ---13%
F_while 69741/s 110% 85% 41% 39% 18% 18%
+ 15%--
As usual, nothing fantastic, but worth the learning experiance.
OH, a sarcasm detector, that’s really useful | [reply] [d/l] [select] |
|
I wrote four XS functions, and one of them (used here) is called rnchrpos(STR, CHARS [, POS]). It works just like rindex(), except it searches for a character that is not one of a set of characters. It's similar to doing:
sub rnchrpos {
my ($str, $chars, $pos) = @_;
$pos ||= $pos;
$pos += length $str if $pos < 0;
return (
substr($str, 0, $pos) =~ /[\Q$chars\E]*\z/ ?
$-[0] - 1 :
-1
);
}
Except that mine is faster, since it's written in C. The results vary greatly. It comes down to knowing your string. If your string has lots of chunks of whitespace, the regular regex approach is bad. If your string has a long string of trailing whitespace, the rnchrpos() approach is bad. If there's no trailing whitespace, or very little, the while approach and the rnchrpos() are good. The sexeger approach is pretty good if the string isn't very long. If your string has a few chunks of whitespace the regular regex approach is good.
We live in a very variable world.
_____________________________________________________
Jeff japhy Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; | [reply] [d/l] |
Why not write your own?
by John M. Dlugosz (Monsignor) on Jul 17, 2001 at 02:12 UTC
|
I found, a while back, a regex library for C++ that was mostly Perl compatible. As you point out, the code actually in Perl is rather difficult to fathem and simply grabbing that part and incorporating it into a C program is not possible. But having that kind of power in the "other" language is so lustful...
| [reply] |
|
| [reply] |
Re: japhy blabs about regexes (again)
by coreolyn (Parson) on Jul 17, 2001 at 19:40 UTC
|
I've been using s/[ ]+//g; which seems to work effectively on all the varied spacing it's run into. Sticking it into runrigs Bench test as CURIOUS it yields:
Benchmark: running CURIOUS, LEADING, LEADTRAIL, LTSAVE, SEXEGER, SINGL
+E_RE, WHILE_RE, each for at least 5 CPU seconds...
CURIOUS: 6 wallclock secs ( 5.42 usr + 0.00 sys = 5.42 CPU) @ 39
+780.44/s (n=215610)
LEADING: 6 wallclock secs ( 5.29 usr + 0.00 sys = 5.29 CPU) @ 68
+400.00/s (n=361836)
LEADTRAIL: 6 wallclock secs ( 5.21 usr + 0.00 sys = 5.21 CPU) @ 28
+325.53/s (n=147576)
LTSAVE: 6 wallclock secs ( 5.32 usr + 0.00 sys = 5.32 CPU) @ 30
+798.68/s (n=163849)
SEXEGER: 5 wallclock secs ( 5.28 usr + 0.00 sys = 5.28 CPU) @ 50
+426.89/s (n=266254)
SINGLE_RE: 6 wallclock secs ( 5.31 usr + 0.00 sys = 5.31 CPU) @ 54
+235.40/s (n=287990)
WHILE_RE: 7 wallclock secs ( 5.47 usr + 0.00 sys = 5.47 CPU) @ 34
+396.53/s (n=188149)
coreolyn | [reply] [d/l] [select] |
|
Mmmhh, you are removing all the spaces from the string, this was not the original question. It was about removing whitespace (which includes \t, \r, \n, \f and your space) from the start and end of a string.
btw, you are using a character class in your regex where it is not necessary, so just write:
s/ +//g; and for this task a regex is overkill, use a transliteration (tr) instead:
tr/ //d;
-- Hofmator
| [reply] [d/l] [select] |
|
| [reply] |