in reply to Re: Unsuccessful stat on file test
in thread Unsuccessful stat on file test

You should use s/\s+$//; instead of s/\s*$//;.    s/\s*$//; will modify every string that it operates on (every string has zero whitespace in it) while s/\s+$//; will only modify strings that actually have whitespace at the end.

Replies are listed 'Best First'.
Re^3: Unsuccessful stat on file test
by Tanktalus (Canon) on Mar 02, 2008 at 21:16 UTC

    Except that, as was the original problem, every string does have whitespace at the end: the EOL character.

    Further, I'm not sure that even if some strings didn't have the EOL whitespace, such as the general case of using s/\s*$// to eliminate whitespace from the end of a string, that this is anything more than premature optimisation. For all I know, perl could already optimise that away, or it may be of no consequence even if it was still "performed". Do you have any evidence that your suggestion improves the OP's code?

    Now, if you were to come in and say to use + instead of * because it more literally descriped what the OP was doing, I'd be in much more (but not complete) agreement. That is, the OP wants to replace all whitespaces leading up to the end of the line, qr/\s+$/, with nothing, whereas the code that is currently there could be read as saying that zero-length strings should be replaced, too, which seems silly. It is apparent to anyone with reasonable amounts of regexp-fu that we don't really care about deleting zero-length strings, so it just seems silly to try replacing them. (Conversely, since it's a "don't care" operation, fixing the code now that it's written and actually works doesn't need doing, either.)

    I can even think of a pathological case where \s* is better than \s+ - and it's not optimisation (well, not of CPU cycles anyway).

    my @filters = ( \&eliminate_ws, \&do_something, \&do_something_else, \ +&etc ); # ... while (<$myfile>) { FILTERS: for my $filter (@filters) { # keep filtering until a filter says to stop. last FILTERS unless $filter->() } } sub eliminate_ws { s/\s*$// } # always returns true. # as opposed to: # sub eliminate_ws { s/\s+$//; 1 } # always returns true, even if s "f +ailed"
    It saves the developer a few keystrokes... but, like I said, it's pathological. ;-)

    Update: given the test below, I'm even more convinced this is premature optimisation. You're doing 1000 tests per sub call, so asking perl what the savings is comes out as (approximately):

    $ perl -e 'printf "%e\n",((1/334000)-(1/1211000))' 2.168248e-06
    so we're saving approximately 2 microseconds using the + instead of *. Seriously, that's not significant unless you really are running 1000's (or 100's of 1000's) of times. That's pretty much the definition of premature optimisation.

      Using a simple Benchmark example:

      $ perl -le' use Benchmark q/cmpthese/; my @data = map { join( "", map { ( "a" .. "z" )[ rand 26 ] } 1 .. 10 + + rand( 10 ) ) . "\n" } 1 .. 1_000; cmpthese -10, { plus => sub { my @temp = @data; s/\s+$// for @temp; return @temp; }, star => sub { my @temp = @data; s/\s*$// for @temp; return @temp; } } ' Rate star plus star 336/s -- -66% plus 992/s 196% -- $ perl -le' use Benchmark q/cmpthese/; my @data = map { join( "", map { ( "a" .. "z" )[ rand 26 ] } 1 .. 10 + + rand( 10 ) ) . "" } 1 .. 1_000; cmpthese -10, { plus => sub { my @temp = @data; s/\s+$// for @temp; return @temp; }, star => sub { my @temp = @data; s/\s*$// for @temp; return @temp; } } ' Rate star plus star 334/s -- -72% plus 1211/s 263% --

      It appears that \s+ is always faster than \s*.     YMMV