Fastest (Golfish) way to strip whitspace off ends.

EvdB has asked for the wisdom of the Perl Monks concerning the following question:

I needed to strip whitespace off both ends of a string and played around with ways of doing it. Can it be done faster than any of these functions:

# No taint implications.
sub A { s/^\s*//; s/\s*$//; return $_; }

# These untaint the return value.
sub B { m/^\s*(.*\S)\s*$/;  return $1; }
sub C { m/(\S+.*\S*)/;      return $1; }
sub D { m/(\S?.*\S*)/;      return $1; }
[download]

I tend to use A as it does not untaint the string due to selecting $1.

Bench marks and complete code follow:

    Rate    A    B    C    D
A 4709/s   -- -34% -39% -44%
B 7091/s  51%   --  -8% -16%
C 7680/s  63%   8%   --  -9%
D 8449/s  79%  19%  10%   --
[download]

Code is:

use strict;
use warnings;

use Benchmark qw(cmpthese);

# No taint implications.
sub A { s/^\s*//; s/\s*$//; return $_; }

# These untaint the return value.
sub B { m/^\s*(.*\S)\s*$/;  return $1; }
sub C { m/(\S+.*\S*)/;      return $1; }
sub D { m/(\S?.*\S*)/;      return $1; }

my @data = (
            'hello', ' hello', ' hello ',
            'hello hello', ' hello hello', 'hello hello ',
            ' hello hello',    '  hello  hello  ',
            'h', ' h', 'h ', ' h ',
            "\n\t hello \n\t"
);

for (@data) {

  my $A = A( $_ );   my $B = B( $_ );
  my $C = C( $_ );   my $D = D( $_ );

  unless ( $A eq $B && $A eq $C && $A eq $D ) {
    warn "Found a disagreement\n";
    warn "\tA is '$A'\n\tB is '$B'\n\tC is '$C'\n\tD is '$D'\n\n";
  } 
  
  if ( $A =~ m/(^\s|\s$)/ ) {
    warn "Regexp not working\n\tA is '$A'\n\n";
  }
}

cmpthese( -3,
          { 
           A => sub { for (@data) { A( $_ ); }},
           B => sub { for (@data) { B( $_ ); }},
           C => sub { for (@data) { C( $_ ); }},
           D => sub { for (@data) { D( $_ ); }},
          });
[download]

--tidiness is the memory loss of environmental mnemonics

Comment on Fastest (Golfish) way to strip whitspace off ends. Select or Download Code

Replies are listed 'Best First'.
Re: Fastest (Golfish) way to strip whitspace off ends. by BrowserUk (Patriarch) on Jul 02, 2003 at 16:25 UTC
I realise this has GOLF in the title, and I'm not the squeamish type, but seeing functions written to rely upon them being given their argument in $_ makes my skin crawl :) For one thing, there is no need or point in actually passing the argument to the functions as A( $_ ), as all you are doing is stacking a variable which you then don't use inside the function. Second, it your going to write functions that emulate some of the built-ins and use $_ by default, you should at least check to see if you have been passed any arguments and operate on them if you have, and only apply the default if you haven't. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller	[reply]
Re: Re: Fastest (Golfish) way to strip whitspace off ends. by halley (Prior) on Jul 02, 2003 at 17:02 UTC
How should that be implemented? I think this should work just like all forms of the built-in `chomp` except it should remove more than trailing newlines. However, the argumentless form doesn't seem to work (doesn't propagate the lvalue context on the caller's `$_`) as I would expect it should. What's the right way of falling back to the caller's lvalue `$_` if there's no arguments? `sub trim { @_ = ($_) if not @_; # doesn't work foreach (@_) { s/^\s//; s/\s$//; s/\n\z//s; } return wantarray? @_ : $_[0]; }` [download] -- `[ e d @ h a l l e y . c c ]`	[reply] [d/l] [select]
Re: Re: Re: Fastest (Golfish) way to strip whitspace off ends. by BrowserUk (Patriarch) on Jul 02, 2003 at 17:21 UTC
Here's a couple of ways, but neither is necessarially the best way. `sub trim { @_ = ($_) if not @_; foreach my $s (@_) { $s =~ s/^\s//; $s =~ s/\s$//; $s =~ s/\n\z//s; } return wantarray? @_ : $_[0]; }` [download] Or `sub trim { @_ = ($_) if not @_; local $_; foreach (@_) { s/^\s//; s/\s$//; s/\n\z//s; } return wantarray? @_ : $_[0]; }` [download] I think that the problem with your implementation was that you were stomping on the aliasing of $_ by re-using the global $_ implicitly in your foreach loop. Using either a my'd var or localising $_ after the assignment to @_ but before the foreach loop seems to fix the problem, though I have a gut feel that there is probably a better way. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller	[reply] [d/l] [select]
Re: Re: Re: Fastest (Golfish) way to strip whitspace off ends. by Enlil (Parson) on Jul 02, 2003 at 22:32 UTC
Halley, I am curious as to what `s/\n\z//s` would actually substitute that the previous two substitutions already had not. AFAICT, the `\n\z` would match any newlines before the end of the string, which should have been already taken care of `s/\s$//;` (as \n is matched by \s) should have captured anyway (i might be missing something here, so please correct me if i am wrong.) Also, I believe that \s+ should be faster(more efficient) in this case than \s . -enlil	[reply] [d/l] [select]
Re: Re: Fastest (Golfish) way to strip whitspace off ends. by EvdB (Deacon) on Jul 03, 2003 at 09:07 UTC
You are absolutely right that the functions as presented are lacking in several ways. Also the useless passing of $_ is duly noted. In my defence I would say that this was just an exercise in looking at the relative speed of the methods, and so much of the sanity checking and so on was omitted for clarity. The functions are just regexps - no more no less. That way all of the speed differential is due to the stripping of whitespace and very else. --`tidiness is the memory loss of environmental mnemonics`	[reply] [d/l]