http://qs1969.pair.com?node_id=17050

Russ has asked for the wisdom of the Perl Monks concerning the following question:

Hopefully, this is an interesting question about the Schwartzian Transform and Perl internal performance:

I thought I was going to demonstrate the performance advantage of the Schwartzian Transform by applying it to the three-way sort code posted by BBQ. It is slower, however, with my quick implementation (which looks right...)

#!/usr/bin/perl -w use Benchmark; timethese(100, { a => q{ my %hash = (key1 => 3, key2 => 2, key3 => "test"); # I used 10000 more for a large test set (omitted for +brevity) @keys = sort { $hash{$b} cmp $hash{$a} || length($b) cmp length($a) || $b <=> $a } keys %hash }, b => q{ my %hash = (key1 => 3, key2 => 2, key3 => "test"); # I used the same 10000 more as above @keys = map { $_->[0] } sort { $b->[1] cmp $a->[1] || $b->[2] <=> $a->[2] || $b->[0] cmp $a->[0] } map{ [$_, $hash{$_}, length $_] } keys %hash; } });
Output:
Benchmark: timing 100 iterations of a, b... a: 274 wallclock secs (272.26 usr + 0.22 sys = 272.48 CPU) b: 366 wallclock secs (359.06 usr + 0.52 sys = 359.58 CPU)
So, now it's once again my turn to learn (happens all the time around here -- I love Perl Monks)...

The Schwartzian Transform is all about reducing the computational difficulty of comparison code during complicated sorts. In this case, I assumed that two hash lookups, followed by two string length calculations, finally followed by a plain comparison would easily qualify as a "complicated" sort block.

What's happening, here? Why does it take longer to dereference and index into an array than to do hash lookups and length() calls?

Russ

P.S. My omitted data had all of the values equal (to force the comparison to always do the most work). It is just generated with map(($_, 1), (1..10000)).

Replies are listed 'Best First'.
RE: Schwartzian Transform vs. plain Perl
by mikfire (Deacon) on Jun 08, 2000 at 16:38 UTC
    The savings generated by the Transform ( too early in the morning to type Schwartzian - sorry merlyn ) is the expensive_func part ( cref) In the archtypical case, that expensive function is hitting the disk for file sizes.

    In your benchmarks, there is no "expensive_func". Yes, the hash lookup is a bit more expensive than the array lookup, but not that expensive. The expense of the extra map() is greater than the savings realized by doing the array lookup.

    To reap the benefits of the Transform, do something real - like stat some rapidly changing file on your filesystem

    HTH,
    mikfire

RE: Schwartzian Transform vs. plain Perl
by jjhorner (Hermit) on Jun 08, 2000 at 16:44 UTC

    I usually take my variable assignment out of the subroutines.

    I took the hash generation:

    map(($_,1), (1..10000));

    out of the subroutines and here is what I got (I had to up the iterations to 100000!):

    [08:35:03 jhorner@gateway scripts]$ ./20000608-2.pl Benchmark: timing 100000 iterations of a, b... a: 2 wallclock secs ( 1.49 usr + 0.00 sys = 1.49 CPU) @ 67 +114.09/s (n=100000) b: 1 wallclock secs ( 1.64 usr + 0.00 sys = 1.64 CPU) @ 60 +975.61/s (n=100000) [08:35:16 jhorner@gateway scripts]$

    Does anyone see any benefit to forcing the hash generation each iteration, or does one declaration work just as well?

    So, to support your theory, I got roughly the same times. To make sure, I upped it to 1000000 iterations:

    Benchmark: timing 1000000 iterations of a, b... a: 15 wallclock secs (14.67 usr + 0.01 sys = 14.68 CPU) @ 68 +119.89/s (n=1000000) b: 16 wallclock secs (16.41 usr + 0.01 sys = 16.42 CPU) @ 60 +901.34/s (n=1000000)

    I believe, and correct me if I'm wrong, the length of values involved makes it less cpu intensive to do the dereferences and indexing. Perhaps if we tried it on larger key->value pairs, we would see more correct results.

    The _EPP_ Book where I see the Schwartzian Transform, it also notes that the best part of the Schwartzian Transform "is that it tends to be the fastest way to perform complicated sorts".

    J. J. Horner
    Linux, Perl, Apache, Stronghold, Unix
    jhorner@knoxlug.org http://www.knoxlug.org/
    
      As long as both functions are doing the hash generation, my first estimation is that it won't change the relation between the methods. I could go through lots of mathematical gyrations, but I won't.

      Thinking a bit more about it, the first function to run may get more penalized than the second, depending on how aggressive perl is in reusing memory. The first call to initialize the hash will have to allocate memory from the system. After the my'd hash goes out of scope, the memory is marked as free but perl doesn't give it back to the system. The next time the hash is allocated, perl may ( again, depending on how aggressive perl is ) just give it the same space. This should be faster than trying to malloc the same amount of space.

      Given 100,000 iterations of each loop I still do not think the penalty is going to be large enough to introduce any skew.

      Personally, I also try to do that stuff outside of the timing loop - I want to remove as many distractions as possible and make sure I am timing how fast the sort is, not how fast my machine can allocate memory or how effectively perl is reusing it.

      mikfire

(jcwren) Re: Schwartzian Transform vs. plain Perl
by jcwren (Prior) on Jun 08, 2000 at 17:14 UTC
    "I see your Schwartz is bigger than mine..."