As I read over Complex hash sorting and considered options for possible solutions I began thinking about how to implement a sort routine that not only can sort on multiple criteria, but that can adapt to the criteria at runtime , thus allowing for dynamic selection of sort criteria.

It seems there are several issues here. First, assume it's unknown at script development time how many search criteria will be used. Second, assume that at script development time it is unknown whether a particular sort criteria will be numeric or string-based. And third, assume that it is unknown at script development time what the order of precedence will be for the various sort criteria.

It struck me that it might be clever to build up a string representation of the multi-criteria sort routine at runtime, when the desired criteria are known. Then one can eval the routine, sort and all, to gain the desired result. Here is what I came up with:

use strict; use warnings; # Name Sex Age IQ Hair Eyes my %hash = ( 'Lisa' => [ 'F', 6, 150, 'Blonde', 'Blue' ], 'Homer' => [ 'M', 40, 105, 'Bald', 'Blue' ], 'Bart' => [ 'M', 9, 120, 'Blonde', 'Brown' ], 'Marge' => [ 'F', 36, 135, 'Blue', 'Blue' ], 'Maggie' => [ 'F', 1, 130, 'Blonde', 'Blue' ] ); # Assume that @criteria is actually obtained at runtime. my @criteria = ( 0, 2, 3 ); my @comparisons; my ( $sample_key, $sample_value ) = each %hash; foreach ( @criteria ) { push @comparisons, "\$hash{\$a}[$_]" . ( $sample_value =~ /^\d+$/ ? " <=> " : " cmp " ) . " \$hash{\$b}[$_]"; } my $routine = join " || ", @comparisons; my @sorted; eval "\@sorted = sort { $routine } keys \%hash" or die "Ick!\n$@\n"; print $_, "\n" foreach @sorted; __OUTPUT__ Maggie Marge Lisa Homer Bart

This method just builds up a sort routine by creating an array of comparison criteria, and joining them together with " || " logical short circuits. The logical short circuit causes the secondary and terciary (etc) criteria to be evaluated if the primary (and so on) criteria evaluate to equality. Using a logical short circuit inside a sort routine is a pretty common idiom. I just worked out one way of allowing the criteria list to be built at runtime.

The resulting sort routine string is interpolated into an eval expression that already contains the necessary skelleton to implement a sort of a hash's keys. What I have intentionally avoided (to reduce complexity) is the necessary logic to allow for dynamic selection of $b<=>$a versus $a<=>$b ordering. It would be possible to allow for that, but to keep this illustration more to the point I glossed over that extra logic.

If anyone else has come up with a more flexible and robust way of creating a sort routine with runtime selected criteria I'd be interested in reading about it here too.


Dave

Replies are listed 'Best First'.
Re: Fun with complex sorting on arbitrary criteria list.
by tilly (Archbishop) on Feb 01, 2004 at 10:44 UTC
    First a debugging tip with eval and dynamically created code. It is really helpful if you have the generated code displayed in error messages. Otherwise bad input can crash your code, but it can be hard to figure out why. Similarly I like using line directives as described at the bottom of perlsyn so that warnings come with nice descriptive messages about where the eval is in my code, rather than knowing that the error was at line 2 of eval 14. (merlyn showed me this trick.)

    Also I don't like using autodetection of datatypes like you have above. My experience is that real-world data often throws you surprises like a field with mixed numbers and strings in it. I've run int that too often to trust code which will break on that case. (Although tye has a cute sprintf trick at Re: Sorting on Section Numbers which is acceptable with many datasets.)

    That said, I have tackled this exact problem with closures. With something like this (untested):

    # Take a list of sort functions, return a sort function # that tests them in order, with short circuit logic. sub ret_combined_sort_func { if (1 == @_) { return shift; } else { my @subs = @_; return sub { my $ret = 0; foreach (@subs) { $ret = $_->(); return $ret if $ret; } # I guess we're "out of sorts" :-) return 0; } } }
    And now you can process the conditions, for each condition that you see, adding another subroutine to a list of known sort conditions. And then combine them as above to get a single sort function suitable for sticking in a sort. This part is straightforward if a bit tedious to write. (The main problem is settling on the format of the data structure that says which fields to sort in which directions.)

    A note to people who like to generalize. I implicitly assume above that the sort functions are defined in the same package that sort will be called in. Or at least know the package. That is because the $a and $b that is set depends where sort is called, so the sort function had better look there. If you don't want to avoid that then you need to modify the closures to all use a ($$) prototype. sort will in recent (5.6 on up) versions of Perl then pass in the two scalars to compare rather than alias them in $a and $b. Now your sort functions can be defined in a different package than sort is eventually called in. (I thought that this was pointed out to me by dws. He doesn't remember that, and I can't find where it was pointed out. I've had several occasions to use it since though.)

    Update: dws pointed out a piece of mangled grammar, and told me that he didn't remember telling me about using prototypes in a sort.

    Update 2: "lease" should have been "least". Fixed.

Re: Fun with complex sorting on arbitrary criteria list.
by stvn (Monsignor) on Feb 01, 2004 at 22:01 UTC
    Dave,

    One suggestion for your code, use eval's return value to create an sub rather than evaling the whole sort routine. Here is a quick example of the technique.

    my $test = eval "sub { print 'hello world' }"; $test->(); # prints the ever-classic "hello world"
    This way you can re-use your sort routine without the eval penalty again.

    Regarding similar sorting techniques. I usually, like tilly, have used closures (although using closures somewhat differently than tilly).

    my %hash = ( 'Lisa' => [ 'F', 6, 150, 'Blonde', 'Blue' ], 'Homer' => [ 'M', 40, 105, 'Bald', 'Blue' ], 'Bart' => [ 'M', 9, 120, 'Blonde', 'Brown' ], 'Marge' => [ 'F', 36, 135, 'Blue', 'Blue' ], 'Maggie' => [ 'F', 1, 130, 'Blonde', 'Blue' ] ); sub createSortRoutine { my (@ordered_values) = @_; return sub { my ($left, $right) = @_; my $result; foreach (@ordered_values) { $result = ( ($hash{$left}->[$_] =~ /^\d+$/) ? $hash{$left}->[$_] <=> $hash{$right}->[$_] : $hash{$left}->[$_] cmp $hash{$right}->[$_] ) unless $result; } return $result; }; } my $sorter = createSortRoutine(0, 2, 3); print join "\n" => sort { $sorter->($a, $b) } keys %hash; # NOTE: this code not thoroughly tested, i lifted it from # and old module of mine and hacked it for this example. __DATA__ Maggie Marge Lisa Homer Bart
    I have not tried to eval a code string for this problem though, personally i was never a huge fan of that technique, although your example is a quite an interesting use of it. Cool stuff.

    -stvn
Re: Fun with complex sorting on arbitrary criteria list.
by simonm (Vicar) on Feb 01, 2004 at 17:05 UTC
    I've got a CPAN module that does some of this: Data::Sorting. It handles a fairly wide range of sorting requests, but doesn't use your trick of eval'ing a dedicated sub.

    There was also talk about a Sort::Records module by Guttman and Rossler that used the eval-a-dedicated-sub technique, but was even more full-featured; unfortunately I've never actually found the code for it...

Re: Fun with complex sorting on arbitrary criteria list.
by parv (Parson) on Feb 01, 2004 at 09:48 UTC
    I think i just did that, after reading your post. What say you (other than being a bit cleaner:)?