in reply to Re: Removing commas and dollar signs from a variable.
in thread Removing commas and dollar signs from a variable.

Why use all that regex processing power that you don't need? tr is a much better solution in this case.

--
<http://www.dave.org.uk>

"The first rule of Perl club is you do not talk about Perl club."
-- Chip Salzenberg

  • Comment on Re: Re: Removing commas and dollar signs from a variable.

Replies are listed 'Best First'.
next question...
by 2501 (Pilgrim) on Dec 22, 2001 at 00:48 UTC
    I am missing the benefits/drawbacks of s/// vs. tr/// .
    is it something to truly consider or is it just "more correct" ?

    thanks!
      tr/// is faster, but s/// can be used for more complex matches.
      I'll support my answer with a Benchmarked example:

      #!/usr/bin/perl -w use strict; use Benchmark; # Generate 1 MB of random data my $data; for (my $i = 0; $i <= 1048576; $i++){ $data .= chr int rand 256; } my $copy = $data; study $data; study $copy; # Remove X'es Benchmark::cmpthese(-10, { 's///' => sub { (my $dummy = $data) =~ s/X//g; }, 'tr///' => sub { (my $dummy = $copy) =~ tr/X//d; } }); print $copy eq $data ? "OK\n" : "NOT OK\n";

      This script's output:
      Benchmark: running s///, tr///, each for at least 10 CPU seconds... s///: 12 wallclock secs (10.80 usr + 0.02 sys = 10.82 CPU) @ 37 +.89/s (n=410) tr///: 15 wallclock secs (10.08 usr + 0.08 sys = 10.16 CPU) @ 42 +.81/s (n=435) Rate s/// tr/// s/// 37.9/s -- -11% tr/// 42.8/s 13% -- OK


      Update 200112212151: I forgot to copy the string before removing the X'es. After the first iteration, there'd be no X'es left. For your entertainment, I present the previous version of my post:
      Hi,

      I was going to answer that tr/// was faster, and s/// can be used for more complex matches.
      I was going to support my answer with a Benchmarked example, which would show that tr/// was a lot faster.

      BUT my benchmark told me s/// is the winner. On 1 MB of random data, s/X//g is faster than tr/X//d. If anyone can tell me why this is, or what I'm doing wrong, I'd really appreciate that.

      #!/usr/bin/perl -w use strict; use Benchmark; # Generate 1 MB of random data my $data; for (my $i = 0; $i < 1048576; $i++){ $data .= chr int rand 256; } my $copy = $data; # Remove X'es Benchmark::cmpthese(-10, { 's///' => sub { $data =~ s/X//g; }, 'tr///' => sub { $copy =~ tr/X//d; } }); print $copy eq $data ? "OK\n" : "NOT OK\n";
      This script's output:
      Benchmark: running s///, tr///, each for at least 10 CPU seconds... s///: 11 wallclock secs (10.52 usr + 0.01 sys = 10.53 CPU) @ 72 +.08/s (n=759) tr///: 10 wallclock secs (10.53 usr + 0.01 sys = 10.54 CPU) @ 61 +.29/s (n=646) Rate tr/// s/// tr/// 61.3/s -- -15% s/// 72.1/s 18% -- OK

      2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

      yes, tr/// is not a regex, where s/// is. This means that tr/// should move faster. Let's find out! I'm using 3 sets of randomly generated data, and removing any occurence of the letter 'e' in the string. The results? A winner is tr///!

      use Benchmark; $reps=500000; $x.=("a".."z")[rand 26] for (1..256); Benchmark::cmpthese($reps, { 'sub256' => '$_=$x;s/e//g;', 'trn256' => '$_=$x;tr/e//d;', }); print "-"x40,"\n"; $x=""; $x.=("a".."z")[rand 26] for (1..1024); Benchmark::cmpthese($reps, { 'sub1024' => '$_=$x;s/e//g;', 'trn1024' => '$_=$x;tr/e//d;', }); print "-"x40,"\n"; $x=""; $x.=("a".."z")[rand 26] for (1..5120); Benchmark::cmpthese($reps, { 'sub5k' => '$_=$x;s/e//g;', 'trn5k' => '$_=$x;tr/e//d;', }); print "-"x40,"\n";
      Benchmark: timing 500000 iterations of sub256, trn256...
          sub256:  3 wallclock secs ( 3.84 usr +  0.00 sys =  3.84 CPU) @ 130208.33/s (n=500000)
          trn256:  2 wallclock secs ( 1.81 usr +  0.00 sys =  1.81 CPU) @ 276243.09/s (n=500000)
                 Rate sub256 trn256
      sub256 130208/s     --   -53%
      trn256 276243/s   112%     --
      ----------------------------------------
      Benchmark: timing 500000 iterations of sub1024, trn1024...
         sub1024: 15 wallclock secs (14.56 usr +  0.00 sys = 14.56 CPU) @ 34340.66/s (n=500000)
         trn1024:  6 wallclock secs ( 6.81 usr +  0.00 sys =  6.81 CPU) @ 73421.44/s (n=500000)
                 Rate sub1024 trn1024
      sub1024 34341/s      --    -53%
      trn1024 73421/s    114%      --
      ----------------------------------------
      Benchmark: timing 500000 iterations of sub5k, trn5k...
           sub5k: 66 wallclock secs (65.36 usr +  0.00 sys = 65.36 CPU) @ 7649.94/s (n=500000)
           trn5k: 31 wallclock secs (30.64 usr +  0.00 sys = 30.64 CPU) @ 16318.54/s (n=500000)
               Rate sub5k trn5k
      sub5k  7650/s    --  -53%
      trn5k 16319/s  113%    --
      ----------------------------------------
      
      
      tr///; uses straight substitution.
      s///; uses the regex engine.
      Straight substitution is relatively faster and 'cheaper'.
      tr/// is for characters, s/// is for strings. tr is generally a better choice where it works because it's more efficent. s/// can do everything tr can because it can work with strings of length 1 (ie - one character), it's just slightly slower. I guess it's more about taste and speed than anything else
Re: Re: Re: Removing commas and dollar signs from a variable.
by Rich36 (Chaplain) on Dec 22, 2001 at 00:35 UTC
    Good point. tr's not something I use much, but probably should. Thanks for the reminder.
    Rich36
    There's more than one way to screw it up...