So today, I came across a scary fact. I was telling someone to use two regexes for removing leading/trailing whitespace, because one regex is a slow evil thing. Here were attempts made:
That last example, I said, was great, because (I thought) Perl optimizes things like /\s+$/. I was (shockingly) wrong. It turns out that /\s$/ is optimized, but not the + version.s/^\s+(.*?)\s+$/$1/; # fails sometimes s/^\s*(.*?)\s*$/$1/; # succeeds, but slow s/^\s*(.*\S)\s*$/$1/; # fails sometimes s/^\s*//, s/\s*$//; # succeeds, but WHY use * ? s/^\s+//, s/\s+$//; # succeeds, but is it good?
Eww...
For interested parties, this is how Perl executes s/\s+$// on a string like "a b c d ".
See? It tests at all the whitespace. How ugly. Simple regexes like that should be optimized.$_ = "a b c d "; # 1, 2, 3, 4 spaces s/\s+$//; =pod A = \s+ B = $ X = fail "a b c d " AX AAX AAAX AAAA
So I tried the sexeger approach:
That ran faster. Then I thought about the optimization of /\s$/, and tried:($_ = reverse) =~ s/^\s+//; $_ = reverse;
That was reasonably fast too -- much faster than the s/\s+$// approach, at least.1 while s/\s$//;
Note that these findings are important, insofaras embedded whitespace in the string -- that's the root of the problem for the s/\s+$// approach.
_____________________________________________________
Jeff japhy Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
In reply to japhy blabs about regexes (again) by japhy
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |