in reply to Hash sorting

Untested bad code.
# Pull out a list of records my @data; foreach my $section (keys %hash) { foreach my $item (keys %{$hash{$section}}) { push @data, [$hash{$section}{$item}, $section, $item]; } } # sort it @data = sort { $b->[0] <=> $a->[0] or $a->[1] cmp $b->[1] or $a->[2] cmp $b->[2] } @data; # Print it. print "Count Section Item\n"; foreach my $record (@data) { printf("%5d %7s %7s\n", @$record); }

Replies are listed 'Best First'.
Re: Re: Hash sorting
by jdporter (Paladin) on May 12, 2003 at 19:17 UTC
    Very good. But how about
    my @data; foreach my $section (keys %hash) { foreach my $item (keys %{$hash{$section}}) { push @data, printf "%5d %7s %7s\n", $hash{$section}{$item}, $secti +on, $item; } } print sort @data;

    jdporter
    The 6th Rule of Perl Club is -- There is no Rule #6.

      Make the printf a sprintf and it works.

      But the point of my post was to demonstrate how building up the array of structures allows you to always find your way through the logic. Which is why I didn't use any tricks to find my way through the logic, didn't nest maps, etc.

      I often use this technique in small data-munging scripts. It is a fast variant of the Guttman Rosler Transform, and shares these limitations with GRT:
      • You must know the maximum size of each data element. For example, if the count is more than 5 digits, then the keys will not align, and the sort will be wrong.
      • You cannot use it to mix ascending and descending sorts. For example, tilly sorted the count descending, but section and item ascending. GRT can't do that.

        Bah, those are easy to deal with in a GRT.

        For the first, instead of using sprintf "%05d", $int, you can use pack "N", $int (for any positive integers that fit in a long) or even sprintf "%02d-%d", length(0+$int), $int (for positive integers of 99 digits or fewer!).

        For the latter, ~ is very handy.

                        - tye
Re: Re: Hash sorting
by Util (Priest) on May 13, 2003 at 05:09 UTC
    I wrote a very similar solution, except for the iterations in the @data-building block.
    while ( my ($section,$v1) = each %$hash ) { while ( my ($item,$count) = each %$v1 ) { push @data, [$count, $section, $item]; } }
    This is not a knee-jerk premature optimization; it is how I think of the loop. I believe that, as an idiom, while/each should be favored over foreach/keys when the key and value are both needed but the order of access does not matter.
    And I am evangelizing. :)
      And I believe that whether or not it should be favoured is a matter of who you are dealing with.

      I believe that while/each should not be used unless you understand the subtlety of context coercion that keeps you from exiting the loop early (quick, why when you needed just the section should you *not* just grab $section in scalar context?), understanding that you only have one iterator for the hash (what bug can that lead to?), and being aware what manipulations you cannot do to the hash while you are iterating over it.

      If you don't understand that clearly, or you do not wish to make sure that whoever works with the code understands this highly Perl-specific knowledge clearly, then it is much, much better to just use the foreach/keys method of iteration. (Honestly I have been avoiding having to explain certain aspects of context-coercion by careful selection of idioms, and I needed to double-check that your code as presented was always going to do the right thing...)

      Given that, I could work in a shop which used either idiom and be happy. But if you have people who use Perl only sometimes, and do lots with other languages, then I would suggest sticking with the foreach/keys method since there are fewer Perl-specific things that they have to remember to avoid getting burned in confusing ways.

      I agree entirely. I've been baffled by how many people either never think of each or even say they actively avoid it. Especially when you're doing several lines of work with each pair, I find the each form moderately to significantly less noisy.

      Although I'd've named $v1 something like $itemcount. :)

      Makeshifts last the longest.

Re: Re: Hash sorting
by jdporter (Paladin) on May 13, 2003 at 14:19 UTC
    Perhaps we should code up a generic solution.
    (UNTESTED) sub tree_paths { my $tree = shift; # assumes hashref map { my $k = $_; my $v = $tree->{$k}; ref $v ? map( [ $k, @$_ ], tree_paths( $v ) ) : [ $k, $v ] } keys %$tree } # now use it with the OP's $hash hashref print "Count Section Item\n"; for ( sort { $b->[2] <=> $a->[2] # count or $a->[0] cmp $b->[0] # section or $a->[1] cmp $b->[1] # item } tree_paths($hash) ) { my( $section, $item, $count ) = @$_; printf "%5d %7s %7s\n", $count, $section, $item; }
    Or perhaps we'd like records with named members:
    (UNTESTED) sub tree_tuples { my( $tree, @names ) = @_; my $name = shift @names; map { my $k = $_; my $v = $tree->{$k}; ref $v ? map( +{ $name => $k, %$_ }, tree_tuples( $v, @names ) ) : { $name => $k, $names[0] => $v } } keys %$tree } # now use it with the OP's $hash hashref print "Count Section Item\n"; for ( sort { $b->{'count'} <=> $a->{'count'} or $a->{'section'} cmp $b->{'section'} or $a->{'item'} cmp $b->{'item'} } tree_tuples( $hash, qw( section item count ) ) ) { printf "%5d %7s %7s\n", @{$_}{qw( count section item )}; }

    jdporter
    The 6th Rule of Perl Club is -- There is no Rule #6.