in reply to Re: Replace consecutive whitespaces with single whitespace
in thread Replace consecutive whitespaces with single whitespace

To avoid Perl doing work when it doesn't have to, I prefer to write that as:
$str =~ s/\s{2,}/ /g;
Note also that using \s matches more than the space the OP mentioned. In particular, s/\s+/ /g or s/\s{2,}/ /g will replace one (two) or more newlines with a space.

Replies are listed 'Best First'.
Re^3: Replace consecutive whitespaces with single whitespace
by tilly (Archbishop) on Mar 26, 2009 at 21:38 UTC
    As for the newline issue, I thought of that, and then I thought that the OP quite possibly had data with tabs in it rather than multiple spaces. In that case you would want to match more than just spaces. You would also want to replace a single tab with a single space.

    Besides which, the extra work is negligible - if you care about that small of an efficiency then Perl is probably the wrong language to use in the first place.

      Besides which, the extra work is negligible
      Maybe, maybe not. Whether it's negligible or not depends on the data at hand. If you have a long string with many single spaces it does matter as you avoid copying the string.
      our $str = "x " x 1000; cmpthese -1, { single => sub {local $_ = $::str; s/\s+/ /g}, multiple => sub {local $_ = $::str; s/\s{2,}/ /g;} }; __END__ Rate single multiple single 1969/s -- -40% multiple 3287/s 67% --
      Copying the string on each single space is an inefficient algorithm.
        You do not copy the string on each single space. If you did then the performance of the substitution would scale like O(n*n), which it does not.

        As your benchmark shows, the overhead for the operation is less than a factor of 2. This is extremely unlikely to be a make or break performance issue. There are plenty of other operations you are likely to do with the string that have an even larger overhead. (For example reading it in or printing it.)

        Unless performance has proven to be an issue and this is where performance was lost, I remain unconcerned.