Crazy hash sorting issue.

rementis has asked for the wisdom of the Perl Monks concerning the following question:

Oh esteemed Monks, I come to you once again.

I have a file that has this output in it:

L01783 NETAPP-DAILY CLOSED-C5
L01456 WIN-DAILY CLOSED-C5
L00378 NETAPP-DAILY CLOSED-C5
L01799 NETAPP-DAILY CLOSED-C5
L00543 NETAPP-WEEKLY CLOSED-C5
L01458 WIN-DAILY CLOSED-C5
L01459 UNIX-DAILY CLOSED-C5
L01797 NETAPP-MONTHLY PROGRAM-P5
L01431 NETAPP-DAILY CLOSED-C5
L00425 WIN-WEEKLY CLOSED-C5
L00277 NETAPP-MONTHLY CLOSED-C5
L00588 UNIX-DAILY CLOSED-C5
L01468 NETAPP-DAILY CLOSED-C5
L01780 UNIX-DAILY CLOSED-C5
L01782 NETAPP-MONTHLY PROGRAM-P5
L01288 NETAPP-WEEKLY CLOSED-C5
L01293 NETAPP-MONTHLY PROGRAM-P5
L01784 NETAPP-MONTHLY PROGRAM-P5
L01121 NETAPP-DAILY CLOSED-C5

I have a hash called %tapes. The key in this hash is L001121 and the value is NETAPP-DAILY CLOSED-C5. (One key/value for each line in the file.)

What I need to do is output a new file after removing some lines. I can do this just fine, but I have a sorting problem. I need to sort this has based on the "C5" and "P5" parts. So the final data should look like this:

L01783 NETAPP-DAILY CLOSED-C5
L01456 WIN-DAILY CLOSED-C5
L00378 NETAPP-DAILY CLOSED-C5
L01799 NETAPP-DAILY CLOSED-C5
L00543 NETAPP-WEEKLY CLOSED-C5
L01458 WIN-DAILY CLOSED-C5
L01459 UNIX-DAILY CLOSED-C5
L01431 NETAPP-DAILY CLOSED-C5
L00425 WIN-WEEKLY CLOSED-C5
L00277 NETAPP-MONTHLY CLOSED-C5
L00588 UNIX-DAILY CLOSED-C5
L01468 NETAPP-DAILY CLOSED-C5
L01780 UNIX-DAILY CLOSED-C5
L01288 NETAPP-WEEKLY CLOSED-C5
L01121 NETAPP-DAILY CLOSED-C5
L01797 NETAPP-MONTHLY PROGRAM-P5
L01782 NETAPP-MONTHLY PROGRAM-P5
L01293 NETAPP-MONTHLY PROGRAM-P5
L01784 NETAPP-MONTHLY PROGRAM-P5

Thanks in advance for your help!
Thanks so much to everyone! I don't know why I didn't think of using substr to compare the values, I guess I just need more experience programming perl. Anyway, this was a huge help and now I've got it working perfectly. Thanks again!

Comment on Crazy hash sorting issue.

Replies are listed 'Best First'.
Re: Crazy hash sorting issue. by dragonchild (Archbishop) on Nov 04, 2005 at 18:11 UTC
`my @keys = sort { my @a = split '-', $tapes{$a}; my @b = split '-', $tapes{$b}; $a[-1] cmp $b[-1] } keys %tapes; foreach my $k (@keys) { print "$k $tapes{$k}\n"; }` [download] However, I wouldn't put that code into production without understanding what it does. I urge you to ask questions, which I'll gladly answer. Update:Fixed keys/values conflation. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply] [d/l]
Re^2: Crazy hash sorting issue. by rementis (Beadle) on Nov 04, 2005 at 20:56 UTC
This sorts based on keys instead of values, no need to use split commands as the keys do not contain any dashes.	[reply]
Re: Crazy hash sorting issue. by polettix (Vicar) on Nov 04, 2005 at 18:26 UTC
So, you basically have to print out a hash sorting lines using the value instead of the key, right? Keep in mind that using a hash won't allow you to keep the original relative order of the lines, but this does not affect what we're saying about the print. If you had to sort only based upon the keys, you'd write something like this: `for my $k (sort keys %hash) { print "$k $hash{$k}\n"; }` [download] Well, you're near. Remember that sort allows you to specify a bare block of code which is actually a sub that's called to establish the relative order of two elements `$a` and `$b`. So, you'd change the things a bit like this: `my @ordered_keys = sort { order($hash{$a}, $hash{$b}) } keys %hash; for my $k ( @ordered_keys ) { print "$k $hash{$k}\n" } sub order { my ($left, $right) = @_; # Perform comparison between $left and $right. Return # -1 if $left must come before $right, 0 if they are # equivalent, +1 if $left must come after $right }` [download] You can include the code in `order()` directly in sort's block, of course, and use `$a` and `$b` instead of `$left` and `$right`, but do this only if readability doesn't suffer from this inclusion. The actual implementation of the `order()` subroutine is left as exercise, but you could extract the last two characters from `$left` and `$right`, and use the cmp operator :) Flavio perl -ple'$_=reverse' <<<ti.xittelop@oivalf Don't fool yourself.	[reply] [d/l] [select]
Re: Crazy hash sorting issue. by japhy (Canon) on Nov 04, 2005 at 18:12 UTC
I can't really detect any "sorting" going on, except that C5 comes first and P5 comes later. I can't really determine if the C5 records have any particular order, though. You could just do `my @ordered_keys = sort { substr($tapes{$a},-2) cmp $substr($tapes{$b},-2) } keys %tapes;` Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply] [d/l]
Re: Crazy hash sorting issue. by kirbyk (Friar) on Nov 04, 2005 at 18:15 UTC
TMTOWTDI, of course, but how about: `foreach my $key (sort { substr($hash{$a}, -2) cmp substr($hash{$b}, -2 +) } keys %hash) { print $key . ' ' . $hash{$key} . "\n"; }` [download] Tested, seems to work. -- Kirby, WhitePages.com	[reply] [d/l]
Re: Crazy hash sorting issue. by japhy (Canon) on Nov 04, 2005 at 18:29 UTC
You can also use a sieve. It's potentially faster than a complicated sort() algorithm to place things matching certain patterns before others. `sub sieve { my @pats = UNIVERSAL::isa($_[0], 'Regexp') ? @_ : map(qr/$_/, @_); sub { my %slots; $slots{$_} = [] for @pats, ""; my $cref = UNIVERSAL::isa($_[0], 'CODE') && shift; ITEM: for (@_) { my $k = $cref ? $cref->() : $_; for my $p (@pats) { push(@{ $slots{$p} }, $_), next ITEM if $k =~ /$p/; } push @{ $slots{""} }, $_; } return map @$_, @slots{@pats, ""}; } } my %tapes = (...); my $c5p5 = sieve(qr/-C5$/, qr/-P5$/); my @sorted_keys = $c5p5->(sub { $tapes{$_} }, keys %tapes);` [download] I hope it's clear what I'm doing. I create a sieve by passing a list of patterns in the order of importance. Then I use that code reference (returned by sieve()) to "sort" a list. I can pass a code block as the first argument to determine what I do to the strings being passed to the sieve. In this case, even though I'm sorting the keys of the hash, I want to do the regex comparison to `$tapes{$_}`. P.S. this looks much prettier in Ruby. Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply] [d/l] [select]
Re^2: Crazy hash sorting issue. by Perl Mouse (Chaplain) on Nov 07, 2005 at 10:32 UTC
You can also use a sieve. It's potentially faster than a complicated sort() algorithm to place things matching certain patterns before others. The code you post maybe faster than a 'sort', but compared to your "sieve", the sort needed is very simple. I fail to understand the reason of this awfully complicated code. For this case, I'd just iterate over the hash twice: `my ($k, $v); $v =~ /C5$/ and print "$k $v\n" while ($k, $v) = each %tapes; $v =~ /P5$/ and print "$k $v\n" while ($k, $v) = each %tapes;` [download] If you want to iterate once, you need additional storage: `my (@C5, @P5); while (my ($k, $v) = each %tapes) { $v =~ /C5$/ ? push @C5, "$k $v\n" : push @P5, "$k $v\n"; } print @C5, @P5;` [download] Although you can reduce storage costs by printing the first set while iterating: `my @P5; while (my ($k, $v) = each %tapes) { $v =~ /C5$/ ? print "$k $v\n" : push @P5, "$k $v\n"; } print @P5;` [download] `Perl --((8:>*`	[reply] [d/l] [select]
Re^3: Crazy hash sorting issue. by japhy (Canon) on Nov 07, 2005 at 13:08 UTC
The more cases you have, the less efficient it is to loop over the data set each time. That's where a sieve comes in handy. It short-circuits the loop over the filters, and loops through the data only once. Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply]
Re^4: Crazy hash sorting issue. by Perl Mouse (Chaplain) on Nov 07, 2005 at 13:59 UTC