in reply to Re: Re: Removing commas and dollar signs from a variable.
in thread Removing commas and dollar signs from a variable.

I am missing the benefits/drawbacks of s/// vs. tr/// .
is it something to truly consider or is it just "more correct" ?

thanks!

Replies are listed 'Best First'.
Re: next question...
by Juerd (Abbot) on Dec 22, 2001 at 01:11 UTC
    tr/// is faster, but s/// can be used for more complex matches.
    I'll support my answer with a Benchmarked example:

    #!/usr/bin/perl -w use strict; use Benchmark; # Generate 1 MB of random data my $data; for (my $i = 0; $i <= 1048576; $i++){ $data .= chr int rand 256; } my $copy = $data; study $data; study $copy; # Remove X'es Benchmark::cmpthese(-10, { 's///' => sub { (my $dummy = $data) =~ s/X//g; }, 'tr///' => sub { (my $dummy = $copy) =~ tr/X//d; } }); print $copy eq $data ? "OK\n" : "NOT OK\n";

    This script's output:
    Benchmark: running s///, tr///, each for at least 10 CPU seconds... s///: 12 wallclock secs (10.80 usr + 0.02 sys = 10.82 CPU) @ 37 +.89/s (n=410) tr///: 15 wallclock secs (10.08 usr + 0.08 sys = 10.16 CPU) @ 42 +.81/s (n=435) Rate s/// tr/// s/// 37.9/s -- -11% tr/// 42.8/s 13% -- OK


    Update 200112212151: I forgot to copy the string before removing the X'es. After the first iteration, there'd be no X'es left. For your entertainment, I present the previous version of my post:
    Hi,

    I was going to answer that tr/// was faster, and s/// can be used for more complex matches.
    I was going to support my answer with a Benchmarked example, which would show that tr/// was a lot faster.

    BUT my benchmark told me s/// is the winner. On 1 MB of random data, s/X//g is faster than tr/X//d. If anyone can tell me why this is, or what I'm doing wrong, I'd really appreciate that.

    #!/usr/bin/perl -w use strict; use Benchmark; # Generate 1 MB of random data my $data; for (my $i = 0; $i < 1048576; $i++){ $data .= chr int rand 256; } my $copy = $data; # Remove X'es Benchmark::cmpthese(-10, { 's///' => sub { $data =~ s/X//g; }, 'tr///' => sub { $copy =~ tr/X//d; } }); print $copy eq $data ? "OK\n" : "NOT OK\n";
    This script's output:
    Benchmark: running s///, tr///, each for at least 10 CPU seconds... s///: 11 wallclock secs (10.52 usr + 0.01 sys = 10.53 CPU) @ 72 +.08/s (n=759) tr///: 10 wallclock secs (10.53 usr + 0.01 sys = 10.54 CPU) @ 61 +.29/s (n=646) Rate tr/// s/// tr/// 61.3/s -- -15% s/// 72.1/s 18% -- OK

    2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

substitution speed vs transliteration speed
by boo_radley (Parson) on Dec 22, 2001 at 01:43 UTC
    yes, tr/// is not a regex, where s/// is. This means that tr/// should move faster. Let's find out! I'm using 3 sets of randomly generated data, and removing any occurence of the letter 'e' in the string. The results? A winner is tr///!

    use Benchmark; $reps=500000; $x.=("a".."z")[rand 26] for (1..256); Benchmark::cmpthese($reps, { 'sub256' => '$_=$x;s/e//g;', 'trn256' => '$_=$x;tr/e//d;', }); print "-"x40,"\n"; $x=""; $x.=("a".."z")[rand 26] for (1..1024); Benchmark::cmpthese($reps, { 'sub1024' => '$_=$x;s/e//g;', 'trn1024' => '$_=$x;tr/e//d;', }); print "-"x40,"\n"; $x=""; $x.=("a".."z")[rand 26] for (1..5120); Benchmark::cmpthese($reps, { 'sub5k' => '$_=$x;s/e//g;', 'trn5k' => '$_=$x;tr/e//d;', }); print "-"x40,"\n";
    Benchmark: timing 500000 iterations of sub256, trn256...
        sub256:  3 wallclock secs ( 3.84 usr +  0.00 sys =  3.84 CPU) @ 130208.33/s (n=500000)
        trn256:  2 wallclock secs ( 1.81 usr +  0.00 sys =  1.81 CPU) @ 276243.09/s (n=500000)
               Rate sub256 trn256
    sub256 130208/s     --   -53%
    trn256 276243/s   112%     --
    ----------------------------------------
    Benchmark: timing 500000 iterations of sub1024, trn1024...
       sub1024: 15 wallclock secs (14.56 usr +  0.00 sys = 14.56 CPU) @ 34340.66/s (n=500000)
       trn1024:  6 wallclock secs ( 6.81 usr +  0.00 sys =  6.81 CPU) @ 73421.44/s (n=500000)
               Rate sub1024 trn1024
    sub1024 34341/s      --    -53%
    trn1024 73421/s    114%      --
    ----------------------------------------
    Benchmark: timing 500000 iterations of sub5k, trn5k...
         sub5k: 66 wallclock secs (65.36 usr +  0.00 sys = 65.36 CPU) @ 7649.94/s (n=500000)
         trn5k: 31 wallclock secs (30.64 usr +  0.00 sys = 30.64 CPU) @ 16318.54/s (n=500000)
             Rate sub5k trn5k
    sub5k  7650/s    --  -53%
    trn5k 16319/s  113%    --
    ----------------------------------------
    
    
Re: next question...
by mrbbking (Hermit) on Dec 22, 2001 at 01:20 UTC
    tr///; uses straight substitution.
    s///; uses the regex engine.
    Straight substitution is relatively faster and 'cheaper'.
Re: next question...
by archen (Pilgrim) on Dec 22, 2001 at 04:51 UTC
    tr/// is for characters, s/// is for strings. tr is generally a better choice where it works because it's more efficent. s/// can do everything tr can because it can work with strings of length 1 (ie - one character), it's just slightly slower. I guess it's more about taste and speed than anything else