Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Choosing "best" - sorting hashes issue

by Ineffectual (Scribe)
on Jan 14, 2005 at 19:30 UTC ( [id://422357]=perlquestion: print w/replies, xml ) Need Help??

Ineffectual has asked for the wisdom of the Perl Monks concerning the following question:

Greetings monks, I have a bunch of data that refers to a single thing. I need to choose the best representation of that single thing by using the qualities of the multiple different measurements. Due to the fact that I have to do about 3k of these things, the data structure has become a hash of a hash of a hash. ie: $hash{$individual thing}{multiple measurements of individual thing}{qualities of current measurement}. for example:
$data{$name}{$measurement}{'SIZE'} = $size
I have ~3k of $name, and around 5 of measurement for each $name and there are 10 qualities such as size. In order to select the "best" one of these measurements for each name, I need to do some sort of selection on several of the different qualities. Thus, it seems I need to sort the inner hash. However, I'm not sure how to sort the qualities into descending order and keep the mapping of which measurement the quality applies to. For example: Name 1 has 5 measurements. The 5 measurements have SIZEs of 100, 200, 300, 400, and 500. If I want to order on the SIZE quality, how do I do this? It'd be great if this would work on qualities that are numbers and letters. Esentially I need to order the 10 different qualities in a variety of ways to find the "best" representation by those qualities. I'm more than happy to change the data structure if someone can recommend something better (I heard mention of AoA in the chatbox, but I'm not sure how to keep the measurement qualities separate in an AoA).
Thanks.
Ineff

Replies are listed 'Best First'.
Re: Choosing "best" - sorting hashes issue
by dragonchild (Archbishop) on Jan 14, 2005 at 19:35 UTC
    You want a relational database. SQLite will do the trick, just fine. A table that looks something like:
    create table measurements ( name varchar ,measurement int ,quality1 int ,quality2 int ,quality3 int );

    Obviously, you'd substitute quality1/quality2/etc. with the names of the qualities.

    Then, if you want the five measurements for Name1 ordered by size, you use the following SQL statement:

    SELECT measurement, size FROM measurements WHERE name = 'Name1' ORDER BY size

    Easy as pie.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Choosing "best" - sorting hashes issue
by samizdat (Vicar) on Jan 14, 2005 at 20:03 UTC
    The previous answer about an embedded database was appropriate, however, a HoA structure might also be appropriate if you don't need persistence. I just built one, so here it is: Hash has UNIX commands as keys. For each hash key, I save an anonymous array of two cells (recent history, and popularity) as value. I then need to sort that hash based on one or the other.
    read a file line into $line, then: my @arry = split(/==/, $line); #recent==pop==commandname $myhash{$arry[$#arry]} = [$arry[0], $arry[1]]; loop until done to sort: my @recentkeys = sort by_HoA_0_asc keys(%myhash); my @popkeys = sort by_HoA_1_desc keys(%myhash); # sort Hash of Arrays by numeric value of first # array value(history), ascending sub by_HoA_0_asc { if ($myhash{$a}[0] < $myhash{$b}[0]) { return -1; } elsif ($myhash{$a}[0] > $myhash{$b}[0]) { return 1; } return 0; } # sort Hash of Arrays by numeric value of second # array value (popularity), descending. If equal, # sort by first array value (history), ascending. sub by_HoA_1_desc { if ($myhash{$a}[1] > $myhash{$b}[1]) { return -1; } elsif ($myhash{$a}[1] < $myhash{$b}[1]) { return 1; } if ($myhash{$a}[0] < $myhash{$b}[0]) { return -1; } elsif ($myhash{$a}[0] < $myhash{$b}[0]) { return 1; } return 0; }
      by_HoA_1_desc can be written more succinctly as
      $myhash{$a}[1] <=> $myhash{$b}[1] || $myhash{$a}[0] <=> $myhash{$b}[0]

      Dave.

        True. I'm still getting used to da spaceship. :)

        One question, how does it compare efficiency-wise to if-elseif chains?
        Ummm... in my case, since popularity is sorted in descending order and history needs to be sorted in ascending order, shouldn't it be:
        $myhash{$b}[1] <=> $myhash{$a}[1] || $myhash{$a}[0] <=> $myhash{$b}[0]
        ?

        In any event, thanks for suggesting the improvement. In my long-winded if-elsif chained version, I had one of the >'s reversed, leading to a very dirty bug.
Re: Choosing "best" - sorting hashes issue
by BrowserUk (Patriarch) on Jan 14, 2005 at 20:41 UTC

    I'd create a hash of indexes. The key is the property and the value is an array of primary keys in your main hash, ordered by the property in question. It's very fast and reasonable straight forward.

    Most of the following code is creating the test data and then displaying the results. It's just the middle of the 3 blocks that does all the work setting up the indexes.

    #! perl -slw use strict; use Data::Dumper; use List::Util qw[ shuffle ]; ## An array of properties my @props = map{ "prop$_" } 1 .. 10; ## Gerate some test data my %things = map { +"thing$_" => { map{ $_ => int rand 1000 } ( shuffle @props )[ 0 .. 4 ] } } '0001' .. '3000'; ## Create a hash of indexes sorted by the the values of the properties my %orderings = map{ my $prop = $_; $prop => [ sort { $things{ $a }{ $_ } <=> $things{ $b }{ $_ } } map{ exists $things{ $_ }{ $prop } ? $_ : () } keys %things ] } @props; ## Display everything, in every order for my $prop ( @props ) { print "\nThings ordered by $prop\n"; for my $thing ( @{ $orderings{ $prop } } ) { print $thing, ' => { ', join( '', map{ sprintf "%6.6s: %3.3s, ", exists $things{ $thing }{ $_ } ? ( $_, $things{ $thing }{ $_ } ) : ( '', '' ) } @props ) . ' }'; } }

    Demo run with to 50 items and 2/4 properties to keep the volume of output down but it handles 10,000 items and 10 properties near instantly.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://422357]
Approved by sschneid
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (7)
As of 2024-04-24 09:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found