in reply to Complex hash sorting

This isn't really as hard as it seems. I'll present two versions that return a list of hash keys in sorted order, sorted by whichever of the sub-values you wish. The first method is going to be a straight sort. If efficiency is an issue, it would probably be best to turn this one into a Schwartzian Transform. But that's not absolutely necessary.

Keep in mind that as others have mentioned, turning your hash values into array-refs is probably a better solution, but I'm going to just work with the assumption that you really do want to keep that '||' delimited list as the value to each hash element.

First, the straight sort:

my $criteria = 3; my @sorted = sort { (split /\|\|/, $hash{$a} )[$criteria] cmp (split /\|\|/, $hash{$b} )[$criteria] } keys %hash +;

The method listed above indexes into the split list, using the specified sub-element as the sort criteria.

This next method will allow any arbitrary number of sort criteria, in the order you choose.

my @criteria = (0, 2); my @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, join( "", ( split /\|\|/, $hash{$_} )[@sorted +] ] } keys %hash;

This second method uses a Schwartzian Transform in conjunction with join to create a single sort criteria comprised of the slice created by taking (possibly multiple) indexes into the split results.

The shortcoming here is that we're doing all string comparison, which sort of wreks havok on strings that have numeric components. But the advantage here is that you can actually specify multiple sort criteria in one search easily.

There are an infinate number of other ways to do this. Here are some thoughts:

You might use eval and build your sort routine dymanically (once) so that it can be of arbitrary complexity, chained together with the logical short-circuit 'or' operator. This will allow the user to specify that he/she needs to sort by the 0th, 2nd, and 1st criteria today, but the 1st and 4th later. Also, by building it up dynamically, you can at runtime decide whether to do a 'cmp' or an '<=>' comparison. I give a simple example of this technique in Fun with complex sorting on arbitrary criteria list.. My example implementation has a few shortcomings too, but most could be overcome without too much extra work.


Dave

Replies are listed 'Best First'.
Re: Re: Complex hash sorting
by Anonymous Monk on Feb 01, 2004 at 09:55 UTC
    The problem with some of the other solutions is, this isn't a static hash I'm trying to sort. This is a tied hash to an SDBM database, so just using an array isn't an option as the program has been up and running with data stored for quite some time already.

    Your first method seems to be easier to understand/follow, though I've never seen things done like this before...so I'll try going with that. Can you explain what/why $criteria is and why it's 3? And after it's in an array, how would you split everything back into separate variables for printing?

    This is an example of my actual coding

    tie %db, "DB_File", "$db", O_CREAT | O_RDWR, 0644, $DB_BTREE or die "Cannot open file 'db': $!\n"; print "<table>"; foreach (keys %db) { my ($count, $ip, $time) = split(/\|\|/, $db{$_}); print "<td>$count</td><td>$ip</td><td>$time</td>"; } </table>
    There are four values (including the key itself), I want to make three of these sortable: the key, $count and $time). My method of doing this is calling a url_param defining the choice of sort. Ie. script.cgi?sort_by=count or script.cgi?sort_by=key and depending on what they chose, a different hash sort method will be used.

    Thanks for your time.

      Just an observation, but if tied hashes are slow, and sorting hashes based on complex operations are slow, and regexes are kind of slow... this code must really break down with a large number of records.

      This seems to be a classic case where taking the time to load the records into a (SQL) database and then query the database would produce cleaner code with (potentially) a speed savings of (who knows) 10x? Use temp tables or temp views if you must.