Spenser has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way to sort a hash by values and leave it in a hash?  I found postings regarding sorting by keys:

sort keys %hash;

I also spotted a posting on a way to sort a hash by values to put it in an array, but not leaving it in a hash:

@keys = sort{$hash{$b} <=> $hash{$a}} keys %hash;

And I found cwest's postings regarding his Tie::SortHash module.  From that I produced this:

tie %clients, 'Tie::SortHash', \%clients, $sortblock; my $sortblock = q(my $c = (split /\s+/, $a)[1]; my $d = (split /\s+/, $b)[1]; $c cmp $d || $clients{$a} <=> $clients{$b});

But this module still only seems to sort by keys and not values.  Maybe there's another way to use this module to get a sort by values.  If anyone knows, please let me know how to do it.  However, maybe I'm trying to make a hash act like an array--that certainly seems to be where I'm going wrong--when an array will do me just fine. Let me explain what I'm trying to do and why I'm trying to stay with a hash.

I'm trying to pull information out of a mySQL database which includes two fields:  client id's and client names.  I've pulled that off without much trouble and fed the data into a hash. From there, I want to display the data in a scrolling list in a web page using the CGI module. This is the code I used for that:

print $q->scrolling_list(-name=>'acct_num', -values=> [keys %clients], -labels=> \%clients);

Again, if I put a "sort" in front of "keys" on the second line, it will sort by the client identification number.  But I need to sort by the values since the users don't know our client id's.  I thought of swapping keys with values when creating the hash, but I need to pass the key to the next web page and the values are fraught with spaces and other undesirable characters (much like our clients).  A simple solution to my problem would be to be able to sort the hash into an array (like the second bit of code above) and then plop that into the labels field of the scrolling_list's properties.  But, I can't seem to get the scrolling_list command to take an array.  Is there a way for one to give it an array instead of a hash?

Any ideas will be appreciated.


Replies are listed 'Best First'.
Re: More Sorted Business with Hashes
by dws (Chancellor) on Dec 22, 2001 at 05:43 UTC
    Is there a way to sort a hash by values and leave it in a hash?

    No. Hashes are inherently non-ordered. However, for what you're trying to do, something like

    @keysByValue = sort { $clients{$a} cmp $clients {$b} } keys %clients;
    will give you keys ordered by value.

        Sure, a tied hash can be kept ordered, by using an array, or some heavier-weight data structure, underneath it.

        The FAQ notwithstanding, the original statement is correct. Hash lookup algorithms produce fundamentally unordered results. Perl's hashes implement a concept properly called associative arrays. They are called hashes because that is how they work under the hood.

        Therefore native Perl hashes are not obviously ordered, and anything you run across in any language called a hash should be assumed to likewise be not obviously ordered.

      Having not used the various modules mentioned in the thread, I'm not sure about relative overhead, but... Using the method above to get an array of sorted keys, how about passing a hash slice, rather than your original hash to wherever you're trying to use the "ordered hash"? I.e. @hash{@keysByValue} rather than just %hash ? Slices are fun! :-)
      And, just in case it's not clear where to go from there, use a hash slice, rather than rather than your original hash: @hash{@keysByValue}
      Slices are fun!
Re: More Sorted Business with Hashes
by I0 (Priest) on Dec 22, 2001 at 05:46 UTC
    -values=> [sort{$clients{$a} <=> $clients{$b}}keys %clients],
      -values=> [sort{$clients{$a} cmp $clients{$b}}keys %clients],
      Note the use of cmp (which sorts alphanumerically) rather than <=> (which sorts numerically). Because Spenser has non-numerals as hash values, <=> will produce seemingly unsorted results.
Re: More Sorted Business with Hashes
by impossiblerobot (Deacon) on Dec 23, 2001 at 01:40 UTC
    I made several attempts at returning a sorted hash:
    #!/usr/bin/perl use strict; use Data::Dumper; my %unsorted = ( key3 => 'bread', key2 => 'apples', key1 => 'steak', key4 => 'butter', key5 => 'oranges', key6 => 'cake', ); my @keys_by_value = sort { $unsorted{$a} cmp $unsorted{$b} } keys %uns +orted; print "Keys by value: @keys_by_value\n"; print "Sorted values: @unsorted{ @keys_by_value }\n"; my @sorted_keys_values = map { $_, $unsorted{$_} } sort { $unsorted{$ +a} cmp $unsorted{$b} } keys %unsorted; print "Keys/values by value: @sorted_keys_values\n"; #my $foo = { @sorted_keys_values }; my $as_hash = { map { $_, $unsorted{$_} } sort { $unsorted{$a} cmp $u +nsorted{$b} } keys %unsorted }; print "Data Dumper:\n"; print Data::Dumper->Dump( [$as_hash],['as_hash'] );

    and though it's fairly easy to get the keys and values sorted by value, as soon as perl converts the data back into a hash structure, it goes back to its natural (unordered) state:
    Keys by value: key2 key3 key4 key6 key5 key1 Sorted values: apples bread butter cake oranges steak Keys/values by value: key2 apples key3 bread key4 butter key6 cake key +5 oranges key1 steak Data Dumper: $as_hash = { 'key1' => 'steak', 'key2' => 'apples', 'key3' => 'bread', 'key4' => 'butter', 'key5' => 'oranges', 'key6' => 'cake' };
    So I was unable to create a sorted hash without resorting (as IlyaM and dws mentioned) to 'tie'.

    Rather than roll my own tie-based solution, I went to CPAN: Tie::SortHash or Tie::IxHash may do what you need.

    Impossible Robot
Silly Rabbit. Sorting is for databases
by PhiRatE (Monk) on Dec 23, 2001 at 08:08 UTC

    select key, value from table order by value

    From there, you want to use arrays as you have noted. A hash is fine so long as you require no particular order, it is essentially a lookup table. As soon as you are iterating across it, it becomes (generally) more sensible to use an array.

Re: More Sorted Business with Hashes
by Spenser (Friar) on Dec 22, 2001 at 23:09 UTC

    Thanks for the feedback and ideas, but I couldn't get any of these ideas to work for me.  They do work if all I wanted to do is sort a hash by values.  But trying to drop such a sorted hash into the CGI directive, scrolling_list() seems to be too much for the module--it either fails or it still sorts according to key or it doesn't sort at all.

    Anyway, I've worked around it. The content of my values already read, "$client_name - $clientid" with the key reading, "$clientid."  So I changed my scrolling_list to read simply like this:

    print $q->scrolling_list(-name=>'acct_num', -values=> [sort values %clients] );

    This gives me the alphabetical sort based on the client names (e.g., values) that I need.  It does make the HTML option select value the same as the label (e.g., client name - client id)--granted, very messy.  But I put the following line in the next CGI script that parses out the data entered in the first script:

    $acct_num = (split/ \- /, $acct_num)[-1];

    This has solved my immediate problem.  However, if anyone can contribute more comments to my original question, I would glad to have them for my education and for those who might read this posting in the future.  Thanks.

      The CGI man page covers this:
      print $q->scolling_list( '-name' => 'acct_num', '-values' => [ sort values %clients ], '-labels' => \%clients '-default' => $default_value );
      It's under popup_menu(), but since they make the same HTML data structure with different attributes, it all works.
        Actually, to amend this, now that I've read it a few days later, you typically use a hash mapped from 'value' to 'text', so if your structure is like the one above, you may have something like:
        print $q->scrolling_list( '-name' => 'acct_num', '-values' => [ sort { $clients{$a} cmp $clients{$b} } keys %clients + ], '-labels' => \%clients, '-default' => $default_value );
        Or, if for some wacky reason, your values really are the values for your menu and the keys are the text labels:
        print $q->scrolling_list( '-name' => 'acct_nu', '-values' => [ sort values %clients ], '-labels' => { reverse %clients }, '-default' => $default_value );
        Typically, when using CGI, the first way is the way most commonly used. Most people don't keep around sorted copies of hashes. It defeats the purpose of a hash. Hashes inherently are unsorted because the algorithm they use to make their lookups fast stores them (internally) in an optimal configuration (which is almost never sorted).

        My fault for not reading the original scrolling_list() call correctly.

Re: More Sorted Business with Hashes
by mstone (Deacon) on Dec 24, 2001 at 05:03 UTC

    The short answer is that you can't. The long answer deals with why you can't:

    A 'hash function' reduces any input to a number within a given range. In most cases, this involves some kind of bit-twiddling:

    $HASH_LIMIT = 1024; ## myhash (key:string) : int # # calculate a checksum # sub myhash { my $key = shift; my $h = 0; for $c (unpack ('c*', $key)) { ## break the key into chars $h *= 2; ## shift the hash value left one bit $h ^= $c; ## and XOR it with the next char } $h %= $HASH_LIMIT; ## trim to fit the defined range return ($h); ## and return the result } for $k (qw( testing the hash function )) { printf ("%10s - %s\n", $k, &myhash($k)); }

    A 'hash' is a data structure that stores values in an array, using the value returned by a hash function as an array index:

    @MY_HASH = (0) x $HASH_LIMIT; ## put (key:string, val:string) : nil # # store a value in a hash # sub put { my $key = shift; my $val = shift; $MY_HASH[ &myhash($key) ] = $val; return; } ## get (key:string) : string # # get a value from the hash # sub get { my $key = shift; return ($MY_HASH[ &myhash($key) ]); }

    and this code is admittedly trivial because it doesn't check for collisions.. cases where two keys return the same hash value. Collisions are inevitable when you deal with hash functions, because hash functions map an infinite domain of input strings to a finite range of output values. It's just hard (very hard) to calculate which strings will map to a given hash value. Most of the interesting work involving hashes revolves around what to do with collisions, but there are too many alternatives to discuss here.

    Hashes are great when you want to work with non-numeric information. Calculating the index value directly from the key is (usually) faster than searching for a specific string in a sorted array. It's also (usually) easier to store a new value in a hash than to shoehorn one into a sorted array, though there are data structures (heaps) which do just that.

    The down side is that you can't assume your hash function will preserve attributes like alphabetical order. It's just a tradeoff you just have to accept if you're going to store information in a hash.