Now this would be bad for large data sets because each element in @unsorted will get sent to mysort several time, so we are then performing the match several times where we dont really need to.my @sorted = sort mysort @unsorted; sub mysort { $a =~ m/(.*?)\s*(\S+)$/; $aa = "$2 $1"; $b =~ m/(.*?)\s*(\S+)$/; $bb = "$2 $1"; return $aa cmp $bb; }
Now we do a simple sort on the second element in @a: @b = sort {$a->[1] cmp $b->[1]} @a; And finally we turn the 2D array back into a sorted 1D array restoring the original values: @sorted = map {$_->[0]} @b; If you cascade the function calls, then you remove the intermediate steps of my @a, and @b and you get the Schwartzian Transform:@unsorted = ("Larry Wall", "Arthur C. Clarke"); @a = map { m/(.*?)\s*(\S+)$/; [$_, "$2 $1" ] } @unsorted; # # now @a is (["Larry Wall", "Wall Larry"], # ["Arthur C. Clarke", "Clarke Arthur C."]) #
Now that is pretty cool stuff, but how much of a time savings is this extra work? Well, I did a small benchmarking test to mimic what a time savings would under a large data set:my @sorted = map{ $_->[0] } sort {$a->[1] cmp $b->[1]} map { m/(.*?)\s*(\S+)$/; [$_, "$2 $1" ] } @unsorted;
And the results:#!/usr/bin/perl use Benchmark; @unsorted = ("Larry Wall", "Jane Sally Doe", "John Doe", "Morphius", "Jane Alice Doe", "Arthur C. Clarke"); timethese(100000, { "schwartzian" => sub { my @sorted = map{ $_->[0] } sort {$a->[1] cmp $b->[1]} map { m/(.*?)\s*(\S+)$/; [$_, "$2 $1" +] } @unsorted }, "sort routine" => sub { my @sorted = sort mysort @unsorted } }); sub mysort { $a =~ m/(.*?)\s*(\S+)$/; $aa = "$2 $1"; $b =~ m/(.*?)\s*(\S+)$/; $bb = "$2 $1"; return $aa cmp $bb; }
Benchmark: timing 100000 iterations of schwartzian, sort routine... schwartzian: 57 wallclock secs (53.66 usr + 0.30 sys = 53.96 CPU) sort routine: 124 wallclock secs (122.48 usr + 0.41 sys = 122.89 CPU)
In reply to RE: Schwartzian Transform
by perlmonkey
in thread Schwartzian Transform
by merlyn
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |