kaweh has asked for the wisdom of the Perl Monks concerning the following question:

I have sequences, their ids and the scores of the squences. I want to sore these sequences by the score. After sorting every sequence should still together with its id and score. I want to use a hash. But I will dynamicly get more and more sequences and their ids, scores. How can I add those to the hash dynamicly? In the book, it said don't use push or pop with hash. And also the order in the hash is not guaranteed. So I don't know how to find a good way to resolve this problem. Thanks a lot!

Replies are listed 'Best First'.
Re: sort and hash
by jeffa (Bishop) on Sep 22, 2003 at 22:58 UTC

    Let's say you have a hash like so:

    my %sequence = ( 1345 => 10, 123 => 20, 500 => 30, );
    You can add more key/value pairs like so:
    $sequence{901} = 40;
    Let's say that your id is stored in the variable $id and the score is stored in the variable $score. You can add this information to the hash like so:
    $sequence{$id} = $score;
    If you want to sort the data by say, the id, you can get a list of the keys with the keys built-in function and use the sort built-in function, specifying that you want to sort numerically with the "spaceship" operator <=> like so:
    print "$_ => $sequence{$_}\n" for sort {$a<=>$b} keys %sequence;
    Perl's motto is 'There Is More Than One Way To Do It'. Give us some more context, like what these id's and scores really look like and we can help even more.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      I need to create a hash first. At first I want to use push. Then I find in the book, it said don't use push or pop. Yeah, your suggestion is also a good way. Then I will sort by scores:) Thank you very much! kaweh
        You are most welcome. :) And welcome to the Monastery.

        Limbic~Region metioned to me that you might be wanting to preserve the sort, that is, the keys stay sorted even after you add new ones. You can use the CPAN module Tie::Hash::Sorted for this, but first see A Guide to Installing Modules if you are not familiar with the CPAN. Tie's are, in my opinion, a bit hard to grasp if you are new to Perl (see perltie), but here goes anyway. :)

        use strict; use warnings; use Data::Dumper; use Tie::Hash::Sorted; my %sequence = ( 1345 => 10, 123 => 20, 500 => 30, ); print Dumper \%sequence; # and here's the part that makes ears bleed ... tie my %sorted_sequence, 'Tie::Hash::Sorted', Hash => \%sequence, Sort_Routine => sub {[sort {$a <=> $b} keys %{$_[0]}]}, ; print Dumper \%sorted_sequence; $sorted_sequence{901} = 40; $sorted_sequence{201} = 50; print Dumper \%sorted_sequence;
        Data::Dumper is another CPAN module, but it comes with your Perl installation so you don't have to install it like you do Tie::Hash::Sorted. Once you have \ installed it, run the above example and notice the output. Note that even though your hash stays 'magically' sorted, there are a lot of CPU cycles burned to achieve it. In other words, if you really don't need to keep the hash sorted, then don't don't keep it sorted.

        Also, don't rule out using an array. Hashes are really good for quick lookups. If you are more concerned with sorting then looking up items, use an array.

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
Re: sort and hash
by Roger (Parson) on Sep 23, 2003 at 01:06 UTC
    Ok, perhaps you need a tutorial on how to dynamically create a hash of data, and how to sort the data. I have constructed a little sample program below which will give you an idea on how to do this.
    use strict; use Data::Dumper; # for investigating data structures my %data; # load data into hash of arrays while (<DATA>) { chomp; my @fields = split/,/; $data{$fields[0]} = \@fields; # first column = hash key } print "\%data - \n", Dumper(\%data); # We will transform the hash into a sorted array # sorted on the 3rd column (score) my @array = sort { $a->[2] <=> $b->[2] } map { $data{$_} } keys %data; print "\@array - \n", Dumper(\@array); # Output sorted data lines print "Sorted data - \n"; foreach (@array) { printf "%s\n", join ',', @{$_}; } __DATA__ Sequence1,0001,80 Sequence2,0002,20 Sequence3,0003,40 Sequence4,0004,100 Sequence5,0005,70
    The code above will output the following results:
    %data - $VAR1 = { 'Sequence1' => [ 'Sequence1', '0001', '80' ], 'Sequence2' => [ 'Sequence2', '0002', '20' ], 'Sequence3' => [ 'Sequence3', '0003', '40' ], 'Sequence4' => [ 'Sequence4', '0004', '100' ], 'Sequence5' => [ 'Sequence5', '0005', '70' ] }; @array - $VAR1 = [ [ 'Sequence2', '0002', '20' ], [ 'Sequence3', '0003', '40' ], [ 'Sequence5', '0005', '70' ], [ 'Sequence1', '0001', '80' ], [ 'Sequence4', '0004', '100' ] ]; Sorted data - Sequence2,0002,20 Sequence3,0003,40 Sequence5,0005,70 Sequence1,0001,80 Sequence4,0004,100
    Where %data contains a hash of arrays of data. @array is a sorted array of array of data. And use join to put the array of data back to its original form.

    To insert a key/value pair into a hash, use the syntax below:
    my %data; $data{key} = value;
    or
    %data->{key} = value;
    The line of code my @array = sort { $a->[2] <=> $b->[2] } map { $data{$_} } keys %data; translates to English: Retrieve the data arrays of for each key in the %data hash, sort the data arrays based on the 3rd data column, and put them in the new array @array. And thus this will give you a sorted array of array of data.

    If you only want to sort the data based on a certain column, why not try out the Schwartzian Transform invented by Randal Schwartz (merlin):
    @sorted_array = map {$_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, (split/,/)[2] ] } # retrieve the third key @original_array;
    Where @original_array is just an array of data strings.
Re: sort and hash
by stajich (Chaplain) on Sep 23, 2003 at 01:11 UTC
    Ids are the unique part right? So for your hash you would have keys being the sequence ID and the scores are the value so you just want to sort based on the value in the hash rather than the id.
    my %sequences = ( 'geneA' => 100, 'geneB' => 105, 'geneC' => 65 ); # sort by score, highest to lowest for my $id ( sort { $sequences{$b} <=> $sequences{$a} } keys %sequences ) { print "$id $sequences{$id}\n"; } # of course you can always add in more sequences $sequences{'catA'} = 75; # if you just want the sorted list of IDs my @ids = sort { $sequences{$b} <=> $sequences{$a} } keys %sequences;
Re: sort and hash
by dmitri (Priest) on Sep 22, 2003 at 22:56 UTC
    If I understand you correctly (for instance, what does the third sentence mean?):

    Maybe 'I want to use a hash' is not a good approach here. It seems you'd be better off using a sorted array for your nodes and use binary search to insert new items.