Re: Re: transliterate

Replies are listed 'Best First'.
Re: Re: Re: transliterate by dondelelcaro (Monk) on Jul 27, 2001 at 22:17 UTC
Which version of perl are you using? 5.005_03 seems to have a really slow `tr///`, that actually is slower than `s///` (singe celeron 466) `Benchmark: timing 250000 iterations of one, two... one: 1 wallclock secs ( 0.83 usr + 0.00 sys = 0.83 CPU) two: 1 wallclock secs ( 0.74 usr + 0.00 sys = 0.74 CPU) Benchmark: timing 2000000 iterations of one, two... one: 7 wallclock secs ( 6.71 usr + 0.00 sys = 6.71 CPU) two: 6 wallclock secs ( 5.93 usr + 0.00 sys = 5.93 CPU)` [download] However, 5.6.1 (on a dual p3 800) tr seems to be almost twice as fast. `Benchmark: timing 250000 iterations of one, two... one: 1 wallclock secs ( 0.26 usr + 0.00 sys = 0.26 CPU) @ 96 +1538.46/s (n=250000) (warning: too few iterations for a reliable count) two: 1 wallclock secs ( 0.46 usr + 0.00 sys = 0.46 CPU) @ 54 +3478.26/s (n=250000) Benchmark: timing 2000000 iterations of one, two... one: 2 wallclock secs ( 2.06 usr + 0.00 sys = 2.06 CPU) @ 97 +0873.79/s (n=2000000) two: 4 wallclock secs ( 3.68 usr + 0.00 sys = 3.68 CPU) @ 54 +3478.26/s (n=2000000)` [download] I don't know too much about the `tr///` and `s///` code, but it looks like someone went and optimized `tr///` between 5.005 and 5.6.1. (Would someone who knows (japhy?) please comment on this?)	[reply] [d/l] [select]
Re: Re: Re: Re: transliterate by trantor (Chaplain) on Jul 27, 2001 at 22:39 UTC
It would be interesting to compare the results using Unicode strings. In this case, the difference between a character by character scan and a Finite State Machine could be greater, although the FSM generated by <ODE>s/ /+/g</CODE> is so simple that it is probably a character by character scan. -- TMTOWTDI	[reply]
Re: Re: Re: transliterate by japhy (Canon) on Jul 27, 2001 at 22:01 UTC
I'm pressed for time, but I'd like you to do two things: Try varied lengths of strings, with varied numbers of plus signs. Use a negative count (that is, -5), which tells Perl to run them for that many (positive) seconds. Please report your findings. I'll be back whenever I can. (I'm talking at TPC today, and I should really be at the talks right now!) _____________________________________________________ Jeff `japhy` Pinyan: Perl, regex, and perl hacker. `s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;`	[reply]
(tye)Re: transliterate by tye (Sage) on Jul 27, 2001 at 22:31 UTC
Another thing to try is to interleave duplicate tests so that start-up "settling in" doesn't scew the results: `use Benchmark qw( cmpthese ); cmpthese( @ARGV ? $ARGV[0] : -3, { Atr => sub { $string = "blah+blah+blah"; $string =~ tr/+/ /; }, As => sub { $string =~ "blah+blah+blah"; $string =~ s/\+/ /g; }, Btr => sub { $string = "blah+blah+blah"; $string =~ tr/+/ /; }, Bs => sub { $string =~ "blah+blah+blah"; $string =~ s/\+/ /g; }, });` [download] which gave me these results: `Rate Atr Btr Bs As Atr 696801/s -- -2% -13% -45% Btr 711990/s 2% -- -11% -44% Bs 800289/s 15% 12% -- -37% As 1260797/s 81% 77% 58% --` [download] Which (update doesn't) show tr to be faster than s but only slightly so. The fact is that the speed difference between s and tr is trivial for such cases. If you have really long strings, then tr becomes more attractive. But when tr really becomes attractive is when you want to replace lots of different characters with lots of different characters. For example, write a ROT13 encoder with s and you'll see what I mean. Update: Sorry, I read the numbers backward. Not that it matters much to me as I don't really care whether I can do 0.7 million per second or 1.2 million per second. (: The point is that they are both fast and fairly close to each other in speed. Thanks to petral for pointing my mistake out to me. Swapping out s for tr for this case is really another case of premature optimization. But making the choice of using tr in the beginning for this case is probably good defensive programming. (: - tye (but my friends call me "Tye")	[reply] [d/l] [select]
Re: Re: Re: Re: transliterate by dsb (Chaplain) on Jul 27, 2001 at 23:54 UTC
Okay, I tried a bunch of different things. Two sets of the same tests, one for 3 seconds, the other 6. use Benchmark; timethese(-3, { aa => sub { $string = "blah+blah+blah"; $string =~ tr/+/ /; }, bb => sub { $string = "blah+blah+blah"; $string =~ s/\+/ /g; }, cc => sub { $string = "blah+blah+blah+blah+blah+blah"; $string =~ tr/+/ /; }, dd => sub { $string = "blah+blah+blah+blah+blah+blah"; $string =~ s/\+/ /g; }, ee => sub { $string = "blah"; $string =~ tr/+/ /; }, ff => sub { $string = "blah"; $string =~ s/\+/ /g; }, }); print "\n"; timethese(-6, { aa => sub { $string = "blah+blah+blah"; $string =~ tr/+/ /; }, bb => sub { $string = "blah+blah+blah"; $string =~ s/\+/ /g; }, cc => sub { $string = "blah+blah+blah+blah+blah+blah"; $string =~ tr/+/ /; }, dd => sub { $string = "blah+blah+blah+blah+blah+blah"; $string =~ s/\+/ /g; }, ee => sub { $string = "blah"; $string =~ tr/+/ /; }, ff => sub { $string = "blah"; $string =~ s/\+/ /g; }, }); [download] The preceding output the following: Benchmark: running aa, bb, cc, dd, ee, ff, each for at least 3 CPU sec +onds... aa: 6 wallclock secs ( 2.91 usr + 0.10 sys = 3.01 CPU) @ 42 +9021.93/s (n=1291356) bb: 8 wallclock secs ( 2.95 usr + 0.05 sys = 3.00 CPU) @ 20 +8165.00/s (n=624495) cc: 8 wallclock secs ( 2.90 usr + 0.10 sys = 3.00 CPU) @ 40 +4326.00/s (n=1212978) dd: 7 wallclock secs ( 2.88 usr + 0.12 sys = 3.00 CPU) @ 13 +4898.33/s (n=404695) ee: 8 wallclock secs ( 3.21 usr + 0.02 sys = 3.23 CPU) @ 50 +7243.96/s (n=1638398) ff: 6 wallclock secs ( 3.05 usr + -0.04 sys = 3.01 CPU) @ 60 +3128.24/s (n=1815416) Benchmark: running aa, bb, cc, dd, ee, ff, each for at least 6 CPU sec +onds... aa: 11 wallclock secs ( 5.87 usr + 0.14 sys = 6.01 CPU) @ 45 +1358.24/s (n=2712663) bb: 11 wallclock secs ( 5.99 usr + 0.01 sys = 6.00 CPU) @ 22 +2497.00/s (n=1334982) cc: 12 wallclock secs ( 6.12 usr + 0.01 sys = 6.13 CPU) @ 45 +0148.29/s (n=2759409) dd: 11 wallclock secs ( 6.39 usr + 0.02 sys = 6.41 CPU) @ 14 +9619.66/s (n=959062) ee: 12 wallclock secs ( 5.97 usr + 0.03 sys = 6.00 CPU) @ 52 +0242.00/s (n=3121452) ff: 7 wallclock secs ( 6.01 usr + 0.00 sys = 6.01 CPU) @ 60 +7347.25/s (n=3650157) [download] So by the looks of the runs per second statistic that `tr` is indeed faster than `s///`. What dondelelcaro mentioned about v5.005003 having a slower version of `tr` could be relevant though, as that is what I'm running. Amel - f.k.a. - kel	[reply] [d/l] [select]