The indisputable speed of tr///

by radiantmatrix (Parson) on Jun 26, 2006 at 20:29 UTC

As for the my -> our change, thanks; I ran into and fixed that bug, must've pasted an old version. Fixed now.

As for "first line", you'll note the commment above the DATA block that I added newlines to the data for posting on PM, but that the code I ran had *no* newlines in the DATA block. All the data was on the first line, in other words. Even if that were not the case, it wouldn't be buggy, it would just be testing less text.

As for the "300x faster than", you're technically correct, it's "300x as fast as". However, you're also a pedant. ;-) Fixed, in any case.

Re: The indisputable speed of tr///
by Zaxo (Archbishop) on Jun 26, 2006 at 20:06 UTC

Just one more question,

Did it help?

In my experience, I/O bottlenecks or pathological algorithms are usually the culprits when apps become too slow.

That said, tr/// is certainly the right way to do it. Teardown time alone on that hash is probably longer than tr/// takes - not to mention all the memory management and copying involved in the split.

The operation is only done once, though, and 300 times nothing can still be pretty small.

After Compline,
Zaxo

by radiantmatrix (Parson) on Jun 26, 2006 at 20:46 UTC

Did it help? Quite a bit, actually. This was one component in an otherwise quite decent system, written by a coder who was essentially a temp. The file-IO stuff was handled as efficiently as reasonable in Perl and with appropriate error-checking. It's entirely a File-IO type of system.

The operation is only done once, though, and 300 times nothing can still be pretty small.

See, that's just it. We're talking about large chunks of text, handled a line at a time for memory's sake -- so this operation is repeated tens of thousands of times per run, and many thousands of runs per day (many of which are in parallel -- this doesn't slam the processor by any stretch, since we wait mostly on I/O).

Now, the actual performance difference in the app at large was no where near the 30_000% in the demonstration but it was significant, and certainly worth doing. This change alone made roughly a 350% difference, and that was after improving the I/O routine to read and process up to 500 lines of text at a time, instead of one.

Re^3: The indisputable speed of tr///

by Jenda (Abbot) on Jun 26, 2006 at 23:08 UTC

I wonder ... if all you were doing was some transliteration why would you bother with the lines at all? Read and process say 64 KB at once. As I said, not sure this is doable in your case.

Second ... why would you spend time "figuring out reasonable character classes? If you have the mapping hash, just build the tr/// out of it. I just tried to benchmark the difference between tr/a-z/A-Z/ and tr/abcdef...xyz/ABCDEF...ZYX/ and the later actually came out a tiny little bit faster. So you could either have the tr/// generated from the hash and copy the result into the script or use string eval to define a subroutine to do the transliteration. my $sub = eval 'sub { my $x = shift(); $x=~ tr/' . join('', keys %h) . '/' . join('', values %h) . '/; return $x}'; or something. If the loop is simple enough you could eval the whole loop so that you do not waste time by the subroutine call for each line. Or am I missing something?

Jenda
Support Denmark!
Defend the free world!

[reply]
[d/l]

Re^4: The indisputable speed of tr///

by radiantmatrix (Parson) on Jun 27, 2006 at 14:32 UTC

by ikegami (Patriarch) on Jun 27, 2006 at 16:59 UTC

by runrig (Abbot) on Jun 27, 2006 at 17:27 UTC

Some notes below your chosen depth have not been shown here

by Jenda (Abbot) on Jun 27, 2006 at 17:30 UTC

Some notes below your chosen depth have not been shown here

by duff (Parson) on Jun 27, 2006 at 18:32 UTC

Re: The indisputable speed of tr///
by ambrus (Abbot) on Jun 27, 2006 at 09:05 UTC

I agree with you in that the tr operator gives very good speed, so you should probably use it in this case if this takes significanf runtime in your script.

However, let me investigate how much speed I can gain without using the tr operator. I will show that the rot13 solution you show isn't nearly optimal even in that case. (I'll assume that you're translating byte strings.)

Read more... (2 kB)

The results on my home computer is:

                Rate   orig hashslice-m gsub-dot hashslice-s gsub-clas
+s hashslice-u arrayslice   tr
orig           962/s     --        -20%     -30%        -40%       -43
+%        -58%       -82% -99%
hashslice-m   1206/s    25%          --     -12%        -25%       -29
+%        -47%       -78% -99%
gsub-dot      1372/s    43%         14%       --        -15%       -19
+%        -40%       -75% -99%
hashslice-s   1605/s    67%         33%      17%          --        -5
+%        -30%       -70% -99%
gsub-class    1687/s    75%         40%      23%          5%         -
+-        -26%       -69% -99%
hashslice-u   2278/s   137%         89%      66%         42%        35
+%          --       -58% -98%
arrayslice    5414/s   463%        349%     295%        237%       221
+%        138%         -- -96%
tr          148174/s 15308%      12184%   10701%       9135%      8681
+%       6403%      2637%   --
[download]

For the way I test here, see also my Benchmarking perfect shuffles. For another case when I use a lookup table instead of transliterations because it's not easy to calculate the correct arguments for tr, see Re (Obfu generator for): Ode for getprotobyname.

Update: added two solutions using s as demerphq has suggested.

[reply]
[d/l]
[select]

by radiantmatrix (Parson) on Jun 27, 2006 at 14:23 UTC

Hey, neat!

I know that the solution using split is tremendously sub-optimal, but that's almost exactly what the code I was given did (just in two steps with an array assignment in between). Thanks for the work on so many ways to do it -- I'm pointing my boss toward your post as further evidence that we now have the optimal solution for that particular chunk of code.

Re: The indisputable speed of tr///
by demerphq (Chancellor) on Jun 27, 2006 at 08:15 UTC

tr/// is only going to help you if all of your transformations are 1:1 in terms of chars. If there are any examples of 1:N then its not going to help you. IE, if you need to convert chars with umlouts to their non-umlaut equivelents then tr/// wont help you. As for it being faster im surprised there was any question. Especially with the code you are using. join/map/split is a pretty horrible way to handle this. If i had to use a hash id do something like s/([...])/$map{$1}/g which is isnt going to be faster than tr/// but will certainly be faster than join/map/split.

---
$world=~s/war/peace/g

[reply]
[d/l]

Re: The indisputable speed of tr///
by shmem (Chancellor) on Jun 27, 2006 at 08:52 UTC

I wouldn't use a 40 ton truck to go fetch a package of cigarettes, if I have a bike ;-)

--shmem

_($_=" "x(1<<5)."?\n".q�/)Oo.  G�\        /
                              /\_�/(q    /
----------------------------  \__(m.====�.(_("always off the crowd"))."�
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Re: The indisputable speed of tr///
by Ieronim (Friar) on Jun 28, 2006 at 16:32 UTC

our $data = <DATA>;
[download]

our $data = join "", <DATA>;
[download]

The worst thing about tr/// is that it can produce only character-to-character mapping. The best solution aviding this problem I found uses eval at compilation time:

BEGIN {
our %rot = (
  'A' => 'N','B' => 'O','C' => 'P','D' => 'Q','E' => 'R','F' => 'S',
  'G' => 'T','H' => 'U','I' => 'V','J' => 'W','K' => 'X','L' => 'Y',
  'M' => 'Z','N' => 'A','O' => 'B','P' => 'C','Q' => 'D','R' => 'E',
  'S' => 'F','T' => 'G','U' => 'H','V' => 'I','W' => 'J','X' => 'K',
  'Y' => 'L','Z' => 'M','a' => 'n','b' => 'o','c' => 'p','d' => 'q',
  'e' => 'r','f' => 's','g' => 't','h' => 'u','i' => 'v','j' => 'w',
  'k' => 'x','l' => 'y','m' => 'z','n' => 'a','o' => 'b','p' => 'c',
  'q' => 'd','r' => 'e','s' => 'f','t' => 'g','u' => 'h','v' => 'i',
  'w' => 'j','x' => 'k','y' => 'l','z' => 'm',
);

my $substs = join "\n", map { "    s/$_/$rot{$_}/g;" } keys %rot;

eval <<SUB;
sub rot13s {
    local \$_ = shift;
    $substs;
    return \$_;
}
SUB

}
[download]

          Rate  table  s///g     tr
table    448/s     --   -78%  -100%
s///g   2026/s   352%     --   -98%
tr    106383/s 23654%  5152%     --
[download]

[reply]
[d/l]
[select]

Re^3: The indisputable speed of tr///

by ikegami (Patriarch) on Jun 28, 2006 at 17:09 UTC

The problem about only reading the first line of the string was raised and fixed in the first post of this thread.

Your solution doesn't work. For example,
print(rot13s(rot13s('AN')), "\n");
prints 'AA' or 'NN' (depending on the hash ordering), but not the expected 'AN'.

You also don't escape the keys and value of %rot as you should. The OP said this is a simplification of his problem, so the presense of punctuation in %rot is likely.

Refer to an earlier post of mine for a working solution using eval to build a substitution and a working solution that uses substitution without using eval.

"[rot13s] is only 52 times slower than [rot13tr]" makes no sense. I think you meant "rot13tr is only 52 times faster than rot13s".

*Only* 52 times faster? So if rot13tr takes 30 seconds — The OP said he had lots of data to process — rot13s takes *only* 26 minutes? I wish my 30 minutes drive home every night only took 30 seconds.

[reply]
[d/l]

by Ieronim (Friar) on Jun 28, 2006 at 18:16 UTC

yes, i was wrong :) i forgot that i at first substitute all A to N and then all N to A, INCLUDING those A's which were already mapped to N :))