Tanktalus has asked for the wisdom of the Perl Monks concerning the following question:

I have a hash of arrays that I want to loop through. That's easy enough. However, I want to loop through them in a specific order - or only partly a specific order. If a certain key exists (no guarantee here), I want that one to be done first. Otherwise order doesn't matter.

What is the smallest, most readable, or preferably both, way to approach this? If I could guarantee that the key exists, I would do something like:

for my $key ('Foo', grep { $_ ne 'Foo' } keys %HoA) { ... }
Checking for Foo's existance seems a bit cumbersome:
for my $key ( exists $HoA{Foo} ? ('Foo') : (), grep { $_ ne 'Foo' } keys %HoA ) { ... }
Is there a simpler way to look at this that I'm missing?

PS: For simplicity, imagine the %HoA has the keys qw(D E F Foo G H). The order I deal with D E F G and H should not be touched from the hash ordering if I can help it.

Update: FYI: The only reason I want to avoid touching the hash ordering is because I don't want to rely on their ordering. It's not because I'm relying on it, but precisely the opposite - keys is supposed to return a differing order from run to run. I want to take advantage of that precisely by being able to show that my algorithm doesn't depend on order except for 'Foo'.

I'm also not looking for optimisation from the perspective of CPU cycles. I'm looking for something that is short, sweet, to the point, and, most importantly, blatantly obvious as to what I'm trying to do.

And, yes, some of the responses have given me other ways to think about it (including tye's anony-sub WTDI). Thanks!

Replies are listed 'Best First'.
Re: Pulling an item to the front of the list (sub)
by tye (Sage) on Oct 04, 2006 at 05:14 UTC

    This is the type of thing where a subroutine can be a very useful abstraction:

    frobnitz( $HoA{Foo} ) if $HoA{Foo}; for my $key ( keys %HoA ) { frobnitz( $HoA{$key} ) if $key ne "Foo"; }

    If you can't factor it out nicely as a subroutine, you can consider factoring it out as an anonymous code ref so that you can use your lexical variables in it.

    Another possibility:

    my %HoA= ( D=>[4], E=>[5], F=>[6], Foo=>[3], G=>[7], H=>[8], ); { local( $HoA{Foo} )= $HoA{Foo}; for my $av ( delete $HoA{Foo}, values %HoA ) { print $av->[0], $/ if $av; } } print $HoA{Foo}[0], $/;

    But I'd not use that in real® code (and it likely depends on things that at least should be "undefined").

    (update) Or use sub to abstract this out differently:

    sub FirstIfKeys { my $hv= pop @_; my %k; @k{ keys %$hv }= (1) x @k; my @k; for my $key ( @_ ) { push @k, $key if delete $k{$key}; } return @k, keys %k; } for my $key ( FirstIfKeys( "Foo", \%HoA ) ) { ... }

    - tye        

Re: Pulling an item to the front of the list
by davido (Cardinal) on Oct 04, 2006 at 04:28 UTC

    Checking for 'Foo's existance is not cumbersome in the "amount of work being done" sense, but grepping is somewhat.

    Hash lookups occur in O(1) time, so checking the existance is not time consuming. However, keys in list context takes O(n) time (essentially the time it takes to generate the list), and grep also takes O(n) time (the time it takes to thumb through the entire list). So you're looking at O( n + n ) time, or O(2n), which isn't the end of the world; it's not as bad as O( n log n ), or O( n**2 ), or worse. But just keep in mind that you're generating the list, and paging through it, in two separate steps.

    There's not much you can do about that. You can't ask for all the keys except 'foo' in a single operation. One solution, however, may be to simply keep track of the keys in the first place. Of course this means there's additional overhead of creating and maintaining an array of keys. So the benefit (if any) depends on how often you create and modify the list of keys, versus how many times you use it. If you create and/or modify infrequently, and use frequently, maintaining a separate list of keys makes sense. If you create and modify frequently, and use seldom, the separate list is probably not going to be much benefit.

    By the way, my initial reaction was to come up with something like this, using a slice:

    my @values = @hash{ exists( $hash{mystest} ) ? 'mytest' : (), grep{ $_ ne 'mytest' } keys %hash };

    As you can see, it's pretty much the same as one of your solutions, except that it simply returns a list of the values.

    Another solution could be to use splice, and unshift to reorganize an array of keys. Again, this is mostly useful if you use the list of keys a lot, but modify it infrequently.

    Update:
    You know, as I thought about this another moment or two, it occurred to me that maybe there would be some benefit to scrapping the whole hash solution, and going with a heap. Heaps excel at keeping one item sorted at the top, and the rest 'just there', in a semi-sorted order waiting for their turn at the top. There are several good heap implementations on CPAN. Maybe a heap is too much fiddling though, it all really depends on how critical it is to optimize this one particular exercise. If this piece of code is really time critical, and if your particular set of criteria would benefit from a heap, then maybe that's the right choice.

    Oh, and forget about any notion of hash ordering. Hashes aren't ordered, and you cannot depend on order being preserved. They don't have order. Assume that with the above piece of code, you're guaranteeing the order of one element, and not caring at all about the order of the rest.


    Dave

Re: Pulling an item to the front of the list
by Zaxo (Archbishop) on Oct 04, 2006 at 04:22 UTC

    You really can't rely on "hash order" for anything, including consistency from run to run. That means that you must ensure that the treatment of non-Foo keys is independent of order.

    Your list construction with exists, trinary, and grep looks good to me. That would be my first choice.

    Another way would be to construct a sort routine which returns -1 if $a eq 'Foo', 1 if $b eq 'Foo', and zero otherwise. That is likely to be less comprehensible than what you have.

    for my $key ( sort { $a eq 'Foo' and return -1; $b eq 'Foo' and return 1; return 0; } keys %hash ) { # do stuff }

    That will work fine for hash keys, but may go strange on you if there can be more than one "Foo" in the list.

    After Compline,
    Zaxo

Re: Pulling an item to the front of the list
by jdporter (Paladin) on Oct 04, 2006 at 05:17 UTC
    Custom comparator.
    sub prefer { my @preferred = @_; sub { for ( @preferred ) { $a eq $_ and return -1; $b eq $_ and return 1; } $a cmp $b } } my $prefer = prefer('foo'); print "$_\n" for sort $prefer keys %h;
    We're building the house of the future together.

      That has a problem if you generate a comparator over more than one preferred key. You're not providing for a stable comparison between two elements of the preferred list.

      At best, that will lead to the sorted order depending on the unsorted order.

      After Compline,
      Zaxo

        Maybe you're right, in theory, but my (meager) testing has shown that my algorithm works as intended, satisfying the OP.
        Can you concretely demonstrate the failure mode you describe?

        We're building the house of the future together.
Re: Pulling an item to the front of the list
by johngg (Canon) on Oct 04, 2006 at 08:38 UTC
    I don't think you need to test for the existence of the key 'Foo' but just sort on whether you have found a 'Foo' or not. I'm sure I've read that sort is stable so it shouldn't alter the key order other than putting 'Foo' at the front if it exists.

    foreach my $key ( map {$_->[0]} sort {$b-[1] <=> $a->[1]} map {[$_, $_ eq q{Foo}]} keys %hash; { # Do something here }

    Cheers,

    JohnGG

    Update: Wrote a test script to see if things worked as I expected.

    use strict; use warnings; # Set up hash with prominent marker key/value pairs # and a load of fillers. No 'Foo' yet. # my %hash = ( Bib => 387, Bob => 745, Baz => 106); $hash{$_} = int rand 10 for q{Bea} .. q{Bep}; my @keysOrder = (); my @prefOrder = (); my $key; # Push key and value strings onto array in keys order. # foreach $key (keys %hash) { push @keysOrder, qq{$key => $hash{$key}}; } # Push key and value strings onto another array in # preferred order. # foreach my $key (prefOrder(keys %hash)) { push @prefOrder, qq{$key => $hash{$key}}; } # Print results side by side for comparison. # print qq{\nKey Order Preferred Order\n}; for (0 .. $#keysOrder) { printf qq{%-14s%-14s\n}, $keysOrder[$_], $prefOrder[$_]; } # Add 'Foo' to the hash and repeat test. # $hash{Foo} = 999; @keysOrder = (); @prefOrder = (); foreach $key (keys %hash) { push @keysOrder, qq{$key => $hash{$key}}; } foreach my $key (prefOrder(keys %hash)) { push @prefOrder, qq{$key => $hash{$key}}; } print qq{\nKey Order Preferred Order\n}; for (0 .. $#keysOrder) { printf qq{%-14s%-14s\n}, $keysOrder[$_], $prefOrder[$_]; } sub prefOrder { return map {$_->[0]} sort {$b->[1] <=> $a->[1]} map {[$_, $_ eq q{Foo}]} @_; }

    Here is the output.

Re: Pulling an item to the front of the list
by davido (Cardinal) on Oct 05, 2006 at 04:53 UTC

    I just wanted to add another followup with a couple more possible solutions. I'll discuss the virtues and weaknesses as we go.

    First, the each method:

    use strict; use warnings; my %hash = ( one => 1, two => 2, three => 3, four => 4, five => 5 ); my $priority_key = 'three'; sub prioritize { my( $href, $find ) = @_; my @prioritized_keys; while( my( $key, $value ) = each %{$href} ) { if( $key ne $find ) { push @prioritized_keys, $key; } else { unshift @prioritized_keys, $key; } } return \@prioritized_keys; } my @keyset = @{ prioritize( \%hash, $priority_key ) }; print "@keyset\n";

    Ok, the good: This iterates through the key list one time, so it's an O(n) operation. Also, all the work being done on @prioritized_keys is done "at the ends" of the array via push and shift, which is faster than splicing the middle of an array. The bad: It's probably less efficient to while{} and each ones way through the key/value pairs, ignoring the values, just to build up a key list. On the other hand, you're only going through the list once, unlike the grep solution. To summarize, this is O(n), but each iteration may be more expensive than my O(2n) solution proposed earlier.

    Ok, now for another solution using splice.

    use strict; use warnings; my %hash = ( one => 1, two => 2, three => 3, four => 4, five => 5 ); my $priority_key = 'three'; sub prioritize { my( $href, $find ) = @_; my @keyring = keys %{$href}; foreach my $idx ( 0 .. $#keyring ) { next unless $keyring[ $idx ] eq $find; unshift( @keyring, splice( @keyring, $idx, 1 ) ); last; } return \@keyring; } my @keyset = @{ prioritize( \%hash, $priority_key ) }; print "@keyset\n";

    The good: Unlike the grep solution, as soon as you've found the priority key, the prioritization loop exits. That means there's a worst case of O(n) for the prioritization loop, and a best case of one iteration, with an average case of O(n/2), assuming the completely random nature of keys in list context yields an average condition. The bad; you are performing two O(n) (worst case) operations; one is creating the key list using keys, and the second is where you find the priority key. One of the loops is performed internally, probably very quickly (that's the keys loop), and the second has the potential of terminating pretty early on, if the priority key is found right away, but there's an equal potential that the second loop will terminate later on. But at very least, unlike grep, the second loop will terminate as soon as possible. On the other hand, there's still more bad news: this solution splices the middle of an array, which is more expensive than pushing or unshifting, possibly even resulting in another O(n) operation (I'm not clear on that, without full knowlege of how splice is implemented internally).

    Without benchmarking, I couldn't say whether the grep solution, the each solution, or the splice solution is the best.

    • The grep solution I posted earlier is O( n + n).
    • The each solution I posted in this node is O(n), but each iteration may be a little more costly per iteration than an iteration of the grep method.
    • The splice method is (average case) O( n + n/2 ) or worst case O( n + n )... unless splice turns out to be inefficient, in which case it may be (worst case) O( 3n ) or (average) O( n + n + n/2 )

    I guess we'll just have to test and see, though unless the hash is huge, it doesn't really matter anyway.


    Dave

Re: Pulling an item to the front of the list
by sfink (Deacon) on Oct 07, 2006 at 05:41 UTC
    You could use a temporary:
    my @order; for my $key (keys %HoA) { if ($key eq 'Foo') { unshift @order, $key; } else { push @order, $key; } } for my $key (@order) { ... }
    and, of course, there are many ways to pronounce that:
    my @order; for my $key (keys %HoA) { ($key eq 'Foo') ? unshift(@order, $key) : push(@order, $key); }
    or
    my @order; for my $key (keys %HoA) { splice(@order, ($key eq 'Foo') ? 0 : @order, (), $key); }

    Or you could take a totally different approach:

    my %early = my %late = %HoA; # or map { $_ => 1 } keys %HoA delete $late{Foo}; delete @early{keys %late}; for my $key (keys %early, keys %late) { ... }
Re: Pulling an item to the front of the list
by duff (Parson) on Oct 04, 2006 at 04:49 UTC

    Perhaps something like this is what you want.

    for my $k (grep { defined } delete $hash{Foo}, keys %hash) { # ... do stuff here }
    Though, the other ways mentioned so far may be more clear (i.e. not appear to work by accident ;-)
    update: I'll claim distraction by my wife for such messed up text :-)
      Ooooo what a lucky man he was!

      --
      tbone1, YAPS (Yet Another Perl Schlub)
      And remember, if he succeeds, so what.
      - Chick McGee