Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Difference between tr// and s///?

by Sol-Invictus (Scribe)
on Feb 06, 2004 at 13:42 UTC ( [id://327062]=note: print w/replies, xml ) Need Help??


in reply to Difference between tr/// and s///?

my day for benchmarking it seems

#!perl-w use Benchmark qw(:all); $toto = 'this+is+my+text'; $count =-5; $results = timethese($count, { 'Translating' => sub { my($copy)=$toto; $copy =~tr/+/ +/; }, 'Substituting' => sub { my($copy)=$toto;$copy =~s/\+/ +/g; }, }, 'none' ); cmpthese( $results ) ; __END__ Yielded: Rate Substituting Translating Substituting 140156/s -- -68% Translating 436379/s 211% --

Another perl function which out performs regex is index(), for checking for exact matches in textual data:

#!perl-w use Benchmark qw(:all); $toto = 'this+is+my+text'; $count =-5; $results = timethese($count, { 'Indexing' => sub { my($copy)=$toto; $index = index($c +opy, 'is');}, 'matching' => sub { my($copy)=$toto;$match if $copy =~ +/is/; }, }, 'none' ); cmpthese( $results ) ; __END__ Gave: Rate matching Indexing matching 552818/s -- -14% Indexing 641382/s 16% --

but obviously there are limitations, with tr// you can't use it on only part of a string and index() only returns the position of the first match of an exact term or character in text data. The extra control and options offered by regex are what make them slower than both these functions in given situations.

Update

As Davido pointed out the original benchmark tests were flawed and have since been adjusted.

You spend twenty years learning the spell that makes nude virgins appear in your bedroom, and then you're so poisoned by quicksilver fumes and half-blind from reading old grimoires that you can't remember what happens next.

Replies are listed 'Best First'.
Re: Re: Difference between tr// and s///?
by davido (Cardinal) on Feb 06, 2004 at 16:43 UTC
    Your benchmark is flawed. On the first iteration of the testing, your $toto variable has all of its '+' characters changed to ' ' (space) characters. After that, all subsequent iterations and tests are acting upon the "fixed" string, and thus, they basically have no more work to do other than their own internal overhead.

    For your benchmark to gain validity, you'll need to make a copy of $toto, to act upon, inside each test, so that the original $toto is always in its original condition. Or declare and define $toto inside of each individual sub being tested.


    Dave

      davido,
      You are correct as the following modified code shows:
      #!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all); my $length = (int rand 100) + 100; my @char = ( 'a' .. 'c' , '+' , 'd' .. 'f' ); my $string; $string .= $char[ int rand 7 ] for 0 .. $length ; my $count = -5; my $results = timethese ( $count, { 'transliterate' => sub { my $foo = $string; $foo =~ tr/ ++/ /; }, 'substitution' => sub { my $bar = $string; $bar =~s/\+ +/ /g; }, }, 'none' ); cmpthese( $results ) ; __END__ Rate substitution transliterate substitution 124624/s -- -84% transliterate 784349/s 529% --
      Cheers - L~R
Re: Re: Difference between tr// and s///?
by HyperZonk (Friar) on Feb 06, 2004 at 16:19 UTC
    Hmmm ... it seems to me that your benchmarks are actually indicating that tr/// is slower than s///. Not what I would have expected, and apparently no one else did either, since despite the evidence, everyone is claiming that this test shows that tr/// is faster.

    Update: I just ran the benchmark on my ActiveState build and got similar results ... that, strangely, s appears to be nearly twice as fast as tr in this trivial example. Results showing only about 1 million tr's per second vs. almost 2 million s's per second:

    Rate transliterate substitution transliterate 1011345/s -- -49% substitution 1969797/s 95% --

    Just for background, this is ActivePerl 5.8.0 build 806.

    UPDATE Update: Thanks to davido for pointing us in the right direction on this in the CB before he posted his node.

    use Benchmark qw(:all); $toto = 'this+is+my+text+and+here+is+more+and+more+this+is+my+text+and ++here+is+more+and+more+this+is+my+text+and+here+is+more+and+more'; $count =-5; $results = timethese($count, { 'transliterate' => sub { $toto = 'this+is+my+text+and+ +here+is+more+and+more+this+is+my+text+and+here+is+more+and+more+this+ +is+my+text+and+here+is+more+and+more'; $toto =~tr/+/ /; }, 'substitution' => sub { $toto = 'this+is+my+text+and+h +ere+is+more+and+more+this+is+my+text+and+here+is+more+and+more+this+i +s+my+text+and+here+is+more+and+more'; $toto =~s/\+/ /g; }, }, 'none' ); cmpthese( $results ) ; exit 0;
    Now we have the same string for the benchmark tests! And the results are as expected:
    Rate substitution transliterate substitution 73807/s -- -88% transliterate 625256/s 747% --

    Boy, do I feel stupid for not seeing that!


    -HZ
      Rate matching Indexing matching 695587/s -- -41% Indexing 1187633/s 71% --
      it reads right to left, rather than horzontally and vertically:
      matching is 41% slower than indexing
      Indexing is 71% faster than matching

      Sol-Invictus

      You spend twenty years learning the spell that makes nude virgins appear in your bedroom, and then you're so poisoned by quicksilver fumes and half-blind from reading old grimoires that you can't remember what happens next.

        Sol-Invictus:

        I was speaking to the tr vs. s, not the s vs. index.


        -HZ
Re: Re: Difference between tr// and s///?
by japhy (Canon) on Feb 06, 2004 at 16:49 UTC
    Your tr/// vs. s/// comparison is very broken. You're using the SAME variable in each iteration. You need to do:
    transliterate => sub { (my $x = $todo) =~ tr/+/ / }, substitution => sub { (my $x = $todo) =~ s/\+/ /g }, just_copy => sub { (my $x = $todo) },
    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
Re: Re: Difference between tr// and s///?
by ysth (Canon) on Feb 06, 2004 at 16:51 UTC
    That benchmark looks awry. I think you are testing as if there are no + in the string (since they will get removed on the first iteration).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://327062]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-04-26 07:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found