http://qs1969.pair.com?node_id=522531

jesuashok has asked for the wisdom of the Perl Monks concerning the following question:

Hi Gr8 People

To trim Spaces we used to do a regular expression of following

+ $value =~ s/^\s+//g; $value =~ /\s+$//g;
Is there any other built in function available to do the stuff. This will be useful for the People who wants to use perl's functions effectively.

"Keep pouring your ideas"

Replies are listed 'Best First'.
Re: what is the function should I need to use for trim white spaces ?
by davido (Cardinal) on Jan 11, 2006 at 19:17 UTC

    There's not a built-in. The regexp approach is fine. If you want it to be a sub, just put it in a sub definition:

    sub trim_white { my $string = shift; $string =~ s/^\s+//g; $string =~ s/\s+$//g; return $string; } # Usage example: my $trimmed = trim_white( $untrimmed );

    Update: For completeness's sake, it might be worth noting that on CPAN you can find String::Util, which includes a function called trim() that does exactly what you're talking about. ...it's been done. ;)


    Dave

Re: what is the function should I need to use for trim white spaces ?
by ikegami (Patriarch) on Jan 11, 2006 at 19:22 UTC

    Your code is missing an s, so it won't do what you want.

    That aside, you could use the following rather obfuscated code:

    s/^\s+//g, s/\s+$//g for $value;
    But it's probably better just to make a function:
    sub trim { local $_ = @_ ? $_[0] : $_; s/^\s+//g; s/\s+$//g; $_ } $trimmed = trim($untrimmed); $trimmed = trim; # Trims $_ by default. @trimmed = map trim, @untrimmed; # Trim a whole list. (readable) push(@trimmed, trim) for @untrimmed; # Trim a whole list. (efficient)
Re: what is the function should I need to use for trim white spaces ?
by jZed (Prior) on Jan 11, 2006 at 19:17 UTC
    > Hi Gr8 People

    Please don't use those kinds of abbreviations. There are people speaking many languages who come here and there's no point in making things harder for them to understand.

    > Is there any other built in function available to do the stuff.

    No, the way you showed is the best way (except you don't need the "g" since you already get all the spaces with "+").

      I thought about this too. You're correct that the /g modifier is kind of pointless by itself, but combined with the /m modifier, you get improved functionality (for some definitions of 'improved'):

      use strict; use warnings; my $string = " now is the \ntime for all \n good men to co +me \n to the aid\n"; $string =~ s/^\s+//mg; $string =~ s/\s+$//mg; print $string, "\n";

      Dave

Re: what is the function should I need to use for trim white spaces ?
by japhy (Canon) on Jan 11, 2006 at 19:38 UTC
Re: what is the function should I need to use for trim white spaces ?
by Roy Johnson (Monsignor) on Jan 11, 2006 at 19:38 UTC
Re: what is the function should I need to use for trim white spaces ?
by explorer (Chaplain) on Jan 11, 2006 at 19:32 UTC
    Yes, you can use functions:
    $value=" kkkk.123 "; for ( $i=0; $i<length($value); $i++ ) { if ( substr($value,$i,1) eq " " ) { substr($value,$i,1) = ""; redo; } else { last; } } for ( $i=length($value)-1; $i>0; $i-- ) { if ( substr($value,$i,1) eq " " ) { $value = substr($value,0,$i); } }
    but you will like to use the s/// operator, true? :-)
Re: what is the function should I need to use for trim white spaces ?
by GrandFather (Saint) on Jan 11, 2006 at 19:25 UTC

    I tend to do it:

    use strict; use warnings; my $before = " some text "; ($_ = $before) =~ s/^\s+|\s+$//g; print ">$before<\n"; print "|$_|\n";

    Prints:

    > some text < |some text|

    I guess this is sufficiently light weight that it is thought a function is not required. Does seem a slight omission though.


    DWIM is Perl's answer to Gödel

      Assinging to $_ without localizing it is dangerous.
      (local $_ = $before) =~ s/^\s+|\s+$//g;
      or
      (my $after = $before) =~ s/^\s+|\s+$//g;
      should be used in just almost all cicumstances.

      I think the alternation in the s/^\s+|\s+$// version causes significant time costs in large applications. I often work with tab-delimited files containing hundreds of thousands of lines. If I'm tab-splitting these lines and then trimming each one, I'm going to pick the double-regex approach each time. I wrote some quick code that benchmarked the double-regex vs single-regex approach against three strings.
      use Benchmark; my @words = ('trim_unneeded',' front trim only','rear trim only ',' + both side trim '); for my $word (@words){ print "Benchmarking $word...\n\n"; timethese(1_000_000, {double => sub{ $word =~ s/^\s+//; $word =~ s +/\s+$//; }, single => sub{ $word =~ s/^\s+|\s+$//; }}) }
      The code was run on a Celeron D 2.8 GHz machine running XP with the following results:

      'trim_unneeded'
      Single Regex: 0.45 seconds
      Double Regex: 2.27 seconds

      ' front trim only'
      Single Regex: 0.67 seconds
      Double Regex: 2.66 seconds

      'rear trim only '
      Single Regex: 0.67 seconds
      Double Regex: 2.45 seconds

      ' both side trim '
      Single Regex: 0.66 seconds
      Double Regex: 2.44 seconds

      That's after only 1,000,000 trims. In a 800,000 line file with 50 columns per line, we're talking about 40,000,000 trims. Assuming a linear scale, that means I give up about a minute of processing time per file per run. That's far less than the time it would have taken me to type two regexes. Admittedly, it's a small optimization, and only valid for those who are processing files on the scale that I do, but for most people who end up typing the 'trim regex' often enough to complain about it on perlmonks, it probably applies.
        "I think the alternation in the s/^\s+|\s+$// version causes significant time costs in large applications."

        Wait, isn't the single regex faster in your benchmark?

        "Single Regex: 0.45 seconds
        Double Regex: 2.27 seconds"

        Ordinary morality is for ordinary people. -- Aleister Crowley
      ($_ = $before) =~ s/^\s+|\s+$//g;

      Just as a minor nitpick I'd localise $_, but then I wouldn't bind with =~, I'd just

      local $_ = $before; s/^\s+|\s+$//g;

      or else

      (my $after = $before) =~ s/^\s+|\s+$//g;