treebeard has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to replace the whitespace in the following string with nulls (ignore the quotes) "BPLIF(many spaces)" Here is the code I am using

$_ =~ s/^\s+//g; $y = length $_; print "$_ $y\n";

Where I would like "BPLIF" back in $_, I keep getting "BPLIF(many spaces)". I am using the length and print statements to test the results. From what I have read "^" means start at the beginning, \s means whitespace, and + means more than one white space. What am doing wrong?

Replies are listed 'Best First'.
Re: Replacing whitespace with null
by mfriedman (Monk) on Oct 11, 2002 at 18:01 UTC
    Your problem is that you are using ^ in the beginning of your pattern. ^ matches the beginning of the string, but your whitespaces are not immediately after the beginning; there are letters in the way. Try something like:

    $_ =~ s/\s+$//g;

    That will remove all whitespace at the end of the string.

Re: Replacing whitespace with null
by kelan (Deacon) on Oct 11, 2002 at 23:22 UTC
    From what I have read "^" means start at the beginning,

    You're misinterpreting the meaning of ^, which is what is breaking your regex. From your language, you think ^ means to start at the beginning of the string and then go through it finding all the whitespace. That is slightly incorrect. The ^ is called an anchor, which is in this case is a start-of-string anchor. It anchors your match to the beginning of the string. Meaning, "for this pattern to match, the string must start with everything that comes next." Your string doesn't start with whitespace, it starts with "BPLIF". In this case what you'd want is simply:

    s/\s+//g;
    This will find stretches of whitespace anywhere in the string. Incidentally, if the string you want to perform substitution or matching on is already in $_, you don't need to use =~ to bind the search expression. You can just do the s/// or m// by itself (as I've done above), and it will default to matching against $_.

    Also, see admiraln's post above about possibly using tr///.

    kelan


    Yak it up with Fullscreen ChatterBox

Re: Replacing whitespace with null
by admiraln (Acolyte) on Oct 11, 2002 at 18:18 UTC
    I believe that tr/\s//d will also do it and should be more efficient.

    Update Except it does not work. tr does not seem to accept \s. tr/\n\t //d does work.

      I cobbled this together based on a post on clpm from several years ago.

      tr beats out s by between 4 and over 10 to 1 if I have interpreted this correctly.

      use strict; use Benchmark; my $string1 = join '', map rand() < 0.4 ? 'x' : ' ', 1 .. 100; my $string2 = join '', map rand() < 0.4 ? 'x' : ' ', 1 .. 10000; printf "Case 1 deletes %3d of %4d characters\n", $string1 =~ tr/ / /, length $string1; printf "Case 2 deletes %3d of %4d characters\n", $string2 =~ tr/ / /, length $string2; timethese( 10000, { tr1 => sub { my $a = $string1; $a =~ tr/\n\t //d}, s1 => sub { my $a = $string1; $a =~ s/\s//g}, tr2 => sub { my $a = $string2; $a =~ tr/\n\t //d}, s2 => sub { my $a = $string2; $a =~ s/\s//g}, }); __END__ Case 1 deletes 7 of 10 characters Case 2 deletes 592 of 1000 characters Benchmark: timing 100000 iterations of s1, s2, tr1, tr2... s1: 2 wallclock secs ( 1.13 usr + 0.00 sys = 1.13 CPU) @ 88 +339.22/s (n=100000) s2: 54 wallclock secs (52.58 usr + 0.00 sys = 52.58 CPU) @ 19 +01.68/s (n=100000) tr1: 0 wallclock secs ( 0.30 usr + 0.00 sys = 0.30 CPU) @ 33 +2225.91/s (n=100000) (warning: too few iterations for a reliable count) tr2: 3 wallclock secs ( 4.34 usr + 0.00 sys = 4.34 CPU) @ 23 +062.73/s (n=100000) Case 1 deletes 5 of 10 characters Case 2 deletes 615 of 1000 characters Benchmark: timing 100000 iterations of s1, s2, tr1, tr2... s1: 1 wallclock secs ( 0.97 usr + 0.00 sys = 0.97 CPU) @ 10 +2880.66/s (n=100000) s2: 54 wallclock secs (54.22 usr + 0.00 sys = 54.22 CPU) @ 18 +44.37/s (n=100000) tr1: 1 wallclock secs ( 0.31 usr + 0.00 sys = 0.31 CPU) @ 32 +1543.41/s (n=100000) (warning: too few iterations for a reliable count) tr2: 5 wallclock secs ( 4.39 usr + 0.00 sys = 4.39 CPU) @ 22 +753.13/s (n=100000) Case 1 deletes 54 of 100 characters Case 2 deletes 5999 of 10000 characters Benchmark: timing 10000 iterations of s1, s2, tr1, tr2... s1: 0 wallclock secs ( 0.54 usr + 0.00 sys = 0.54 CPU) @ 18 +484.29/s (n=10000) s2: 54 wallclock secs (52.71 usr + 0.01 sys = 52.72 CPU) @ 18 +9.66/s (n=10000) tr1: 0 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU) @ 16 +6666.67/s (n=10000) (warning: too few iterations for a reliable count) tr2: 4 wallclock secs ( 4.30 usr + 0.00 sys = 4.30 CPU) @ 23 +27.75/s (n=10000)