rementis has asked for the wisdom of the Perl Monks concerning the following question:

Oh esteemed Monks, I come to you once again.

I have a file that has this output in it:


L01783 NETAPP-DAILY CLOSED-C5
L01456 WIN-DAILY CLOSED-C5
L00378 NETAPP-DAILY CLOSED-C5
L01799 NETAPP-DAILY CLOSED-C5
L00543 NETAPP-WEEKLY CLOSED-C5
L01458 WIN-DAILY CLOSED-C5
L01459 UNIX-DAILY CLOSED-C5
L01797 NETAPP-MONTHLY PROGRAM-P5
L01431 NETAPP-DAILY CLOSED-C5
L00425 WIN-WEEKLY CLOSED-C5
L00277 NETAPP-MONTHLY CLOSED-C5
L00588 UNIX-DAILY CLOSED-C5
L01468 NETAPP-DAILY CLOSED-C5
L01780 UNIX-DAILY CLOSED-C5
L01782 NETAPP-MONTHLY PROGRAM-P5
L01288 NETAPP-WEEKLY CLOSED-C5
L01293 NETAPP-MONTHLY PROGRAM-P5
L01784 NETAPP-MONTHLY PROGRAM-P5
L01121 NETAPP-DAILY CLOSED-C5

I have a hash called %tapes. The key in this hash is L001121 and the value is NETAPP-DAILY CLOSED-C5. (One key/value for each line in the file.)

What I need to do is output a new file after removing some lines. I can do this just fine, but I have a sorting problem. I need to sort this has based on the "C5" and "P5" parts. So the final data should look like this:

L01783 NETAPP-DAILY CLOSED-C5
L01456 WIN-DAILY CLOSED-C5
L00378 NETAPP-DAILY CLOSED-C5
L01799 NETAPP-DAILY CLOSED-C5
L00543 NETAPP-WEEKLY CLOSED-C5
L01458 WIN-DAILY CLOSED-C5
L01459 UNIX-DAILY CLOSED-C5
L01431 NETAPP-DAILY CLOSED-C5
L00425 WIN-WEEKLY CLOSED-C5
L00277 NETAPP-MONTHLY CLOSED-C5
L00588 UNIX-DAILY CLOSED-C5
L01468 NETAPP-DAILY CLOSED-C5
L01780 UNIX-DAILY CLOSED-C5
L01288 NETAPP-WEEKLY CLOSED-C5
L01121 NETAPP-DAILY CLOSED-C5
L01797 NETAPP-MONTHLY PROGRAM-P5
L01782 NETAPP-MONTHLY PROGRAM-P5
L01293 NETAPP-MONTHLY PROGRAM-P5
L01784 NETAPP-MONTHLY PROGRAM-P5

Thanks in advance for your help!
Thanks so much to everyone! I don't know why I didn't think of using substr to compare the values, I guess I just need more experience programming perl. Anyway, this was a huge help and now I've got it working perfectly. Thanks again!

Replies are listed 'Best First'.
Re: Crazy hash sorting issue.
by dragonchild (Archbishop) on Nov 04, 2005 at 18:11 UTC
    my @keys = sort { my @a = split '-', $tapes{$a}; my @b = split '-', $tapes{$b}; $a[-1] cmp $b[-1] } keys %tapes; foreach my $k (@keys) { print "$k $tapes{$k}\n"; }

    However, I wouldn't put that code into production without understanding what it does. I urge you to ask questions, which I'll gladly answer.

    Update:Fixed keys/values conflation.


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      This sorts based on keys instead of values, no need to use split commands as the keys do not contain any dashes.
Re: Crazy hash sorting issue.
by polettix (Vicar) on Nov 04, 2005 at 18:26 UTC
    So, you basically have to print out a hash sorting lines using the value instead of the key, right? Keep in mind that using a hash won't allow you to keep the original relative order of the lines, but this does not affect what we're saying about the print.

    If you had to sort only based upon the keys, you'd write something like this:

    for my $k (sort keys %hash) { print "$k $hash{$k}\n"; }
    Well, you're near. Remember that sort allows you to specify a bare block of code which is actually a sub that's called to establish the relative order of two elements $a and $b. So, you'd change the things a bit like this:
    my @ordered_keys = sort { order($hash{$a}, $hash{$b}) } keys %hash; for my $k ( @ordered_keys ) { print "$k $hash{$k}\n" } sub order { my ($left, $right) = @_; # Perform comparison between $left and $right. Return # -1 if $left must come *before* $right, 0 if they are # equivalent, +1 if $left must come *after* $right }
    You can include the code in order() directly in sort's block, of course, and use $a and $b instead of $left and $right, but do this only if readability doesn't suffer from this inclusion.

    The actual implementation of the order() subroutine is left as exercise, but you could extract the last two characters from $left and $right, and use the cmp operator :)

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.
Re: Crazy hash sorting issue.
by japhy (Canon) on Nov 04, 2005 at 18:12 UTC
    I can't really detect any "sorting" going on, except that C5 comes first and P5 comes later. I can't really determine if the C5 records have any particular order, though.

    You could just do my @ordered_keys = sort { substr($tapes{$a},-2) cmp $substr($tapes{$b},-2) } keys %tapes;


    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: Crazy hash sorting issue.
by kirbyk (Friar) on Nov 04, 2005 at 18:15 UTC
    TMTOWTDI, of course, but how about:
    foreach my $key (sort { substr($hash{$a}, -2) cmp substr($hash{$b}, -2 +) } keys %hash) { print $key . ' ' . $hash{$key} . "\n"; }
    Tested, seems to work.

    -- Kirby, WhitePages.com

Re: Crazy hash sorting issue.
by japhy (Canon) on Nov 04, 2005 at 18:29 UTC
    You can also use a sieve. It's potentially faster than a complicated sort() algorithm to place things matching certain patterns before others.
    sub sieve { my @pats = UNIVERSAL::isa($_[0], 'Regexp') ? @_ : map(qr/$_/, @_); sub { my %slots; $slots{$_} = [] for @pats, ""; my $cref = UNIVERSAL::isa($_[0], 'CODE') && shift; ITEM: for (@_) { my $k = $cref ? $cref->() : $_; for my $p (@pats) { push(@{ $slots{$p} }, $_), next ITEM if $k =~ /$p/; } push @{ $slots{""} }, $_; } return map @$_, @slots{@pats, ""}; } } my %tapes = (...); my $c5p5 = sieve(qr/-C5$/, qr/-P5$/); my @sorted_keys = $c5p5->(sub { $tapes{$_} }, keys %tapes);
    I hope it's clear what I'm doing. I create a sieve by passing a list of patterns in the order of importance. Then I use that code reference (returned by sieve()) to "sort" a list. I can pass a code block as the first argument to determine what I do to the strings being passed to the sieve. In this case, even though I'm sorting the keys of the hash, I want to do the regex comparison to $tapes{$_}.

    P.S. this looks much prettier in Ruby.


    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
      You can also use a sieve. It's potentially faster than a complicated sort() algorithm to place things matching certain patterns before others.
      The code you post maybe faster than a 'sort', but compared to your "sieve", the sort needed is very simple.

      I fail to understand the reason of this awfully complicated code. For this case, I'd just iterate over the hash twice:

      my ($k, $v); $v =~ /C5$/ and print "$k $v\n" while ($k, $v) = each %tapes; $v =~ /P5$/ and print "$k $v\n" while ($k, $v) = each %tapes;
      If you want to iterate once, you need additional storage:
      my (@C5, @P5); while (my ($k, $v) = each %tapes) { $v =~ /C5$/ ? push @C5, "$k $v\n" : push @P5, "$k $v\n"; } print @C5, @P5;
      Although you can reduce storage costs by printing the first set while iterating:
      my @P5; while (my ($k, $v) = each %tapes) { $v =~ /C5$/ ? print "$k $v\n" : push @P5, "$k $v\n"; } print @P5;
      Perl --((8:>*
        The more cases you have, the less efficient it is to loop over the data set each time. That's where a sieve comes in handy. It short-circuits the loop over the filters, and loops through the data only once.

        Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
        How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart