Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Difference between tr/// and s///?

by kalamiti (Friar)
on Feb 06, 2004 at 09:02 UTC ( [id://327021]=perlquestion: print w/replies, xml ) Need Help??

kalamiti has asked for the wisdom of the Perl Monks concerning the following question:

I want to know why tr///; is used instead of s///; in example like :
$toto = 'this+is+my+text'; $toto =~tr/+/ /;
why not $toto =~s/\+/ /g; ? this was my assumption. I know both works but get lost on wich to choose. thanks for explanations.

20040206 Edit by BazB: Changed title for all of thread, added code tags

Replies are listed 'Best First'.
Re: Difference between tr// and s///?
by Abigail-II (Bishop) on Feb 06, 2004 at 09:15 UTC
    They are not equivalent. $toto =~ s/\+//g; removes all plus signs. $toto =~ tr/+// only counts the plus signs. If you want to delete them, use tr/+//d. But the latter is still prefered over s/\+//g for performance reasons. tr is faster. Of course, in many practical cases, the speed difference is too small to notice.

    Abigail

      Notice there's a space between the second and third slash in the OP's code example. With proper code tags you can see more clearly:

      $toto = 'this+is+my+text'; $toto =~tr/+/ /; why not $toto =~s/\+/ /g; ?
        Ah, yes, I didn't realize that originally the OP didn't use <code> tags, so I didn't notice the space.

        My second point still stands though, tr/// is usually faster than s///. It is faster because tr/// is a much simpler operation. It's a character by character replacement - no copy of the string needs to be made, nor do large parts of the string be moved. An s/// operation cannot assume such a thing, replacement parts can be shorter or smaller than the thing that needs to be replaced.

        Abigail

Re: Difference between tr// and s///?
by kvale (Monsignor) on Feb 06, 2004 at 09:18 UTC
    As you say, both work. One might use  tr/// because character translation is probably a faster process than regexp match/substitution.

    Update: Abigail is correct. Pay no attention here.

    -Mark

      Update: Abigail is correct. Pay no attention here.

      Au contraire, you're dead on correct. See my reply above.

Re: Difference between tr// and s///?
by Sol-Invictus (Scribe) on Feb 06, 2004 at 13:42 UTC
    my day for benchmarking it seems

    #!perl-w use Benchmark qw(:all); $toto = 'this+is+my+text'; $count =-5; $results = timethese($count, { 'Translating' => sub { my($copy)=$toto; $copy =~tr/+/ +/; }, 'Substituting' => sub { my($copy)=$toto;$copy =~s/\+/ +/g; }, }, 'none' ); cmpthese( $results ) ; __END__ Yielded: Rate Substituting Translating Substituting 140156/s -- -68% Translating 436379/s 211% --

    Another perl function which out performs regex is index(), for checking for exact matches in textual data:

    #!perl-w use Benchmark qw(:all); $toto = 'this+is+my+text'; $count =-5; $results = timethese($count, { 'Indexing' => sub { my($copy)=$toto; $index = index($c +opy, 'is');}, 'matching' => sub { my($copy)=$toto;$match if $copy =~ +/is/; }, }, 'none' ); cmpthese( $results ) ; __END__ Gave: Rate matching Indexing matching 552818/s -- -14% Indexing 641382/s 16% --

    but obviously there are limitations, with tr// you can't use it on only part of a string and index() only returns the position of the first match of an exact term or character in text data. The extra control and options offered by regex are what make them slower than both these functions in given situations.

    Update

    As Davido pointed out the original benchmark tests were flawed and have since been adjusted.

    You spend twenty years learning the spell that makes nude virgins appear in your bedroom, and then you're so poisoned by quicksilver fumes and half-blind from reading old grimoires that you can't remember what happens next.

      Your benchmark is flawed. On the first iteration of the testing, your $toto variable has all of its '+' characters changed to ' ' (space) characters. After that, all subsequent iterations and tests are acting upon the "fixed" string, and thus, they basically have no more work to do other than their own internal overhead.

      For your benchmark to gain validity, you'll need to make a copy of $toto, to act upon, inside each test, so that the original $toto is always in its original condition. Or declare and define $toto inside of each individual sub being tested.


      Dave

        davido,
        You are correct as the following modified code shows:
        #!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all); my $length = (int rand 100) + 100; my @char = ( 'a' .. 'c' , '+' , 'd' .. 'f' ); my $string; $string .= $char[ int rand 7 ] for 0 .. $length ; my $count = -5; my $results = timethese ( $count, { 'transliterate' => sub { my $foo = $string; $foo =~ tr/ ++/ /; }, 'substitution' => sub { my $bar = $string; $bar =~s/\+ +/ /g; }, }, 'none' ); cmpthese( $results ) ; __END__ Rate substitution transliterate substitution 124624/s -- -84% transliterate 784349/s 529% --
        Cheers - L~R
      Hmmm ... it seems to me that your benchmarks are actually indicating that tr/// is slower than s///. Not what I would have expected, and apparently no one else did either, since despite the evidence, everyone is claiming that this test shows that tr/// is faster.

      Update: I just ran the benchmark on my ActiveState build and got similar results ... that, strangely, s appears to be nearly twice as fast as tr in this trivial example. Results showing only about 1 million tr's per second vs. almost 2 million s's per second:

      Rate transliterate substitution transliterate 1011345/s -- -49% substitution 1969797/s 95% --

      Just for background, this is ActivePerl 5.8.0 build 806.

      UPDATE Update: Thanks to davido for pointing us in the right direction on this in the CB before he posted his node.

      use Benchmark qw(:all); $toto = 'this+is+my+text+and+here+is+more+and+more+this+is+my+text+and ++here+is+more+and+more+this+is+my+text+and+here+is+more+and+more'; $count =-5; $results = timethese($count, { 'transliterate' => sub { $toto = 'this+is+my+text+and+ +here+is+more+and+more+this+is+my+text+and+here+is+more+and+more+this+ +is+my+text+and+here+is+more+and+more'; $toto =~tr/+/ /; }, 'substitution' => sub { $toto = 'this+is+my+text+and+h +ere+is+more+and+more+this+is+my+text+and+here+is+more+and+more+this+i +s+my+text+and+here+is+more+and+more'; $toto =~s/\+/ /g; }, }, 'none' ); cmpthese( $results ) ; exit 0;
      Now we have the same string for the benchmark tests! And the results are as expected:
      Rate substitution transliterate substitution 73807/s -- -88% transliterate 625256/s 747% --

      Boy, do I feel stupid for not seeing that!


      -HZ
        Rate matching Indexing matching 695587/s -- -41% Indexing 1187633/s 71% --
        it reads right to left, rather than horzontally and vertically:
        matching is 41% slower than indexing
        Indexing is 71% faster than matching

        Sol-Invictus

        You spend twenty years learning the spell that makes nude virgins appear in your bedroom, and then you're so poisoned by quicksilver fumes and half-blind from reading old grimoires that you can't remember what happens next.

      Your tr/// vs. s/// comparison is very broken. You're using the SAME variable in each iteration. You need to do:
      transliterate => sub { (my $x = $todo) =~ tr/+/ / }, substitution => sub { (my $x = $todo) =~ s/\+/ /g }, just_copy => sub { (my $x = $todo) },
      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
      That benchmark looks awry. I think you are testing as if there are no + in the string (since they will get removed on the first iteration).
Re: Difference between tr// and s///?
by hardburn (Abbot) on Feb 06, 2004 at 14:12 UTC

    In addition to what others have noted, here is a simple rule on when to use tr/// or s///: if you can get away with using tr///, do it. tr/// doesn't even startup the regex engine, so it's pretty fast (see above). The price you pay is flexibility, so don't spend too much time trying to nail a square tr/// solution into a round s/// problem.

    ----
    I wanted to explore how Perl's closures can be
    manipulated, and ended up creating an object
    system by accident.
    -- Schemer

    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

Re: Difference between tr// and s///? (function not speed)
by tye (Sage) on Feb 06, 2004 at 17:26 UTC

    The difference is that one translates single characters and one substitutes patterns. Which to use should depend mostly on what you are doing. If you are replacing pluses with spaces, then pick whichever you are more comfortable with or based on what direction the code is likely to evolve in.

    If you foresee the code changing to s/[\s+]+/ /g one day, then starting with s/\+/ /g is probably a good idea. If you foresee the code changing to tr/+\-<>/ _()/, then starting with tr/+/ / is probably a good idea.

    For these reasons, I usually prefer s/\+/ /g over tr/+/ /.

    Some will make a big deal about tr/// usually being faster than s///. I make a big deal about the speed difference between tr/// and s/// usually being imperceptible. Benchmark has to do a lot of tricky things to make the difference measurable. tr/// is usually faster than s///, but not always and it almost always doesn't matter in the slightest.

    Here is a benchmark to demonstrate both points:

    #!/usr/bin/perl use Benchmark qw( cmpthese ); my( $count, $len )= @ARGV; $count ||= 10; $len ||= 5000; my $string= join '+', map { join '', map {('a'..'z')[rand(26)]} 0..rand($len) } 1..$count; sub subst { $string =~ s/\+/-/g; $string =~ s/\-/+/g; 0; } sub trans { $string =~ tr/\+/-/; $string =~ tr/\-/+/; 0; } cmpthese( -3, { a_s => \&subst, b_s => \&subst, a_t => \&trans, b_t => \&trans, } );
    and the output:
    Rate b_t a_t b_s a_s b_t 8190/s -- -1% -40% -40% a_t 8258/s 1% -- -39% -40% b_s 13615/s 66% 65% -- -1% a_s 13720/s 68% 66% 1% --

    So here I've got an unusually long string (50,000 bytes) and the difference in speed between the two is about 0.00005 seconds. Most of the time your strings aren't that long so the difference is even less.

    Sure, there are rare cases where you are doing hundreds of thousands of these operations on really long strings and these tiny difference add up. But even in such cases, if you manage to get them to add up to a whole second or two, then all of the other overhead (which Benchmark has to work hard to subtract from the above comparisons) usually adds up to several minutes and the difference is still imperceptible.

    So, in those cases you should probably be studying the algorithm you are using or profiling the code rather than running benchmarks trying to prematurely optimize nano-operations such as these. (:

    Update: BTW, the reason (or at least my educated guess at the reason) that this benchmark shows s/// being faster is because tr/// looks up every character in the map that it builds while, in this case, s/// can (more quickly) skip to the next character that it cares about.

    - tye        

      thank you all, in one day I've learn much more than ever regarding all these Perl things.
        Indeed. It happens very often (like in this post) that I look at a post and say to myself "Oh, I know the answer to that!" Then I look at the answers and I'm absolutely blown away by the depth of other's knowledge.

        I find that Perlmonks makes me feel inadequete. I also find I always play a game of go after reading Perlmonks, to make me feel better about me ^_^.

        Fantastic responces everybody! ^_^
Re: Difference between tr// and s///?
by bradcathey (Prior) on Feb 06, 2004 at 14:33 UTC
    The difference was confusing for me, as well, when I first started learning Perl. The discussion of speed, notwithstanding, it should be noted that tr// isn't as 'flexible' as s// because the former will only translate the exact number of characters in the original string as there are characters in the replacement list. So, beyond our replacement of '+' signs:
    $toto = 'this+is+my+text'; $toto =~ tr/my/your/d;
    $toto will now equal 'this+is+yo+text'. Where:
    $toto = 'this+is+my+text'; $toto =~ s/my/your/;
    $toto will now equal 'this+is+your+text'. A helpful distinction that is not blatantly obvious in the reading of Perl docs.

    —Brad
    "A little yeast leavens the whole dough."
      Also do not forget that tr/// has no concept of "words". So if you do a tr/this/that/, it means that all 't' get replaced by 't'; 'h' by 'h', 'i' by 'a' and 's' by 't', which is probably not what you wanted if you expected all words "this" to be replaced by the word "that" (which of course will happen, but at the same time "is" will be changed into "at"). So tr/// will work on individual characters only. If you need to do some more sophisticated work, use s///.

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Difference between tr// and s///?
by allolex (Curate) on Feb 06, 2004 at 09:44 UTC

    From perlop, which you can access via the command perldoc perlop from the command line. (Try perldoc perldoc first)

    tr/SEARCHLIST/REPLACEMENTLIST/cds
    y/SEARCHLIST/REPLACEMENTLIST/cds
    Transliterates all occurrences of the characters found in the search list with the corresponding character in the replacement list. It returns the number of characters replaced or deleted. If no string is specified via the =~ or !~ operator, the $_ string is transliterated. (The string specified with =~ must be a scalar variable, an array element, a hash element, or an assignment to one of those, i.e., an lvalue.) A character range may be specified with a hyphen, so "tr/A-J/0-9/" does the same replacement as "tr/ACEG-IBDFHJ/0246813579/". For sed devotees, "y" is provided as a synonym for "tr". If the SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has its own pair of quotes, which may or may not be bracketing quotes, e.g., "tr[A-Z][a-z]" or "tr(+\-*/)/ABCD/". Note that "tr" does not do regular expression character classes such as "\d" or "[:lower:]". The <tr> operator is not equivalent to the tr(1) utility. If you want to map strings between lower/upper cases, see "lc" in perlfunc and "uc" in perlfunc, and in general consider using the "s" operator if you need regular expressions.
    s/PATTERN/REPLACEMENT/egimosx
    Searches a string for a pattern, and if found, replaces that pattern with the replacement text and returns the number of substitutions made. Otherwise it returns false (specifically, the empty string). If no string is specified via the "=~" or "!~" operator, the $_ variable is searched and modified. (The string specified with "=~" must be scalar variable, an array element, a hash element, or an assignment to one of those, i.e., an lvalue.) If the delimiter chosen is a single quote, no interpolation is done on either the PATTERN or the REPLACEMENT. Otherwise, if the PATTERN contains a $ that looks like a variable rather than an end-of-string test, the variable will be interpolated into the pattern at run-time. If you want the pattern compiled only once the first time the variable is interpolated, use the "/o" option. If the pattern evaluates to the empty string, the last successfully executed regular expression is used instead. See perlre for further explanation on these. See perllocale for discussion of additional considerations that apply when "use locale" is in effect

    --
    Allolex

      Uhm, how does that explain why tr/+/ / is used instead of s/\+/ /g? The post of the OP clearly suggest to me that the OP knows what tr/// and s/// do.

      Abigail

        It was this sentence:

        kalamiti: I know both works but get lost on wich to choose

        I felt that kalamiti was not really sure about what s/// and tr/// do, since they both were not doing the same thing in the code, and decided that adding some more documentation to your post (which was already the highest-voted node at the time) would be a good idea.

        Abigail-II: The post of the OP clearly suggest to me that the OP knows what tr/// and s/// do.

        I didn't get the same impression.

        --
        Allolex

        It's a frightening trend. Whenever someone asks a question, you'll usually get someone replying by copying & pasting perldoc info. In my view, they are trying to look smart by doing so and perhaps get some XP. (Hey, look at me, I know how to use perldoc!)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://327021]
Approved by fireartist
Front-paged by bart
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-04-25 19:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found