Re: Re: Removing commas and dollar signs from a variable.

Replies are listed 'Best First'.
next question... by 2501 (Pilgrim) on Dec 22, 2001 at 00:48 UTC
I am missing the benefits/drawbacks of s/// vs. tr/// . is it something to truly consider or is it just "more correct" ? thanks!	[reply]
Re: next question... by Juerd (Abbot) on Dec 22, 2001 at 01:11 UTC
tr/// is faster, but s/// can be used for more complex matches. I'll support my answer with a Benchmarked example: `#!/usr/bin/perl -w use strict; use Benchmark; # Generate 1 MB of random data my $data; for (my $i = 0; $i <= 1048576; $i++){ $data .= chr int rand 256; } my $copy = $data; study $data; study $copy; # Remove X'es Benchmark::cmpthese(-10, { 's///' => sub { (my $dummy = $data) =~ s/X//g; }, 'tr///' => sub { (my $dummy = $copy) =~ tr/X//d; } }); print $copy eq $data ? "OK\n" : "NOT OK\n";` [download] This script's output: `Benchmark: running s///, tr///, each for at least 10 CPU seconds... s///: 12 wallclock secs (10.80 usr + 0.02 sys = 10.82 CPU) @ 37 +.89/s (n=410) tr///: 15 wallclock secs (10.08 usr + 0.08 sys = 10.16 CPU) @ 42 +.81/s (n=435) Rate s/// tr/// s/// 37.9/s -- -11% tr/// 42.8/s 13% -- OK` [download] Update 200112212151: I forgot to copy the string before removing the X'es. After the first iteration, there'd be no X'es left. For your entertainment, I present the previous version of my post: Hi, I was going to answer that tr/// was faster, and s/// can be used for more complex matches. I was going to support my answer with a Benchmarked example, which would show that tr/// was a lot faster. BUT my benchmark told me s/// is the winner. On 1 MB of random data, s/X//g is faster than tr/X//d. If anyone can tell me why this is, or what I'm doing wrong, I'd really appreciate that. `#!/usr/bin/perl -w use strict; use Benchmark; # Generate 1 MB of random data my $data; for (my $i = 0; $i < 1048576; $i++){ $data .= chr int rand 256; } my $copy = $data; # Remove X'es Benchmark::cmpthese(-10, { 's///' => sub { $data =~ s/X//g; }, 'tr///' => sub { $copy =~ tr/X//d; } }); print $copy eq $data ? "OK\n" : "NOT OK\n";` [download] This script's output: `Benchmark: running s///, tr///, each for at least 10 CPU seconds... s///: 11 wallclock secs (10.52 usr + 0.01 sys = 10.53 CPU) @ 72 +.08/s (n=759) tr///: 10 wallclock secs (10.53 usr + 0.01 sys = 10.54 CPU) @ 61 +.29/s (n=646) Rate tr/// s/// tr/// 61.3/s -- -15% s/// 72.1/s 18% -- OK` [download] `2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$` [download]	[reply] [d/l] [select]
substitution speed vs transliteration speed by boo_radley (Parson) on Dec 22, 2001 at 01:43 UTC
yes, tr/// is not a regex, where s/// is. This means that tr/// should move faster. Let's find out! I'm using 3 sets of randomly generated data, and removing any occurence of the letter 'e' in the string. The results? A winner is tr///! `use Benchmark; $reps=500000; $x.=("a".."z")[rand 26] for (1..256); Benchmark::cmpthese($reps, { 'sub256' => '$_=$x;s/e//g;', 'trn256' => '$_=$x;tr/e//d;', }); print "-"x40,"\n"; $x=""; $x.=("a".."z")[rand 26] for (1..1024); Benchmark::cmpthese($reps, { 'sub1024' => '$_=$x;s/e//g;', 'trn1024' => '$_=$x;tr/e//d;', }); print "-"x40,"\n"; $x=""; $x.=("a".."z")[rand 26] for (1..5120); Benchmark::cmpthese($reps, { 'sub5k' => '$_=$x;s/e//g;', 'trn5k' => '$_=$x;tr/e//d;', }); print "-"x40,"\n";` [download] Benchmark: timing 500000 iterations of sub256, trn256... sub256: 3 wallclock secs ( 3.84 usr + 0.00 sys = 3.84 CPU) @ 130208.33/s (n=500000) trn256: 2 wallclock secs ( 1.81 usr + 0.00 sys = 1.81 CPU) @ 276243.09/s (n=500000) Rate sub256 trn256 sub256 130208/s -- -53% trn256 276243/s 112% -- ---------------------------------------- Benchmark: timing 500000 iterations of sub1024, trn1024... sub1024: 15 wallclock secs (14.56 usr + 0.00 sys = 14.56 CPU) @ 34340.66/s (n=500000) trn1024: 6 wallclock secs ( 6.81 usr + 0.00 sys = 6.81 CPU) @ 73421.44/s (n=500000) Rate sub1024 trn1024 sub1024 34341/s -- -53% trn1024 73421/s 114% -- ---------------------------------------- Benchmark: timing 500000 iterations of sub5k, trn5k... sub5k: 66 wallclock secs (65.36 usr + 0.00 sys = 65.36 CPU) @ 7649.94/s (n=500000) trn5k: 31 wallclock secs (30.64 usr + 0.00 sys = 30.64 CPU) @ 16318.54/s (n=500000) Rate sub5k trn5k sub5k 7650/s -- -53% trn5k 16319/s 113% -- ----------------------------------------	[reply] [d/l]
Re: next question... by mrbbking (Hermit) on Dec 22, 2001 at 01:20 UTC
`tr///;` uses straight substitution. `s///;` uses the regex engine. Straight substitution is relatively faster and 'cheaper'.	[reply] [d/l] [select]
Re: next question... by archen (Pilgrim) on Dec 22, 2001 at 04:51 UTC
tr/// is for characters, s/// is for strings. tr is generally a better choice where it works because it's more efficent. s/// can do everything tr can because it can work with strings of length 1 (ie - one character), it's just slightly slower. I guess it's more about taste and speed than anything else	[reply]
Re: Re: Re: Removing commas and dollar signs from a variable. by Rich36 (Chaplain) on Dec 22, 2001 at 00:35 UTC
Good point. tr's not something I use much, but probably should. Thanks for the reminder. Rich36 There's more than one way to screw it up...	[reply]