in reply to How to (get started on) sort AoA or AoH by frequency

Okay, So without using hashes, here is a sample showing a crude way of getting the sort I need (annotated for clarity):

#!/usr/bin/perl use warnings; use strict; use Data::Dumper; my @results = (["chpt10_2", "sent. 2", "alice", "nsubj", "animals", "p +rotect"], ["chpt12_1", "sent. 54", "bob", "nsubj", "cells", "prot +ect"], ["chpt25_4", "sent. 47", "carol", "nsubj", "plants", "p +rotect"], ["chpt34_1", "sent. 1", "dave", "nsubj", "cells", "prot +ect"], ["chpt35_1", "sent. 2", "eli", "nsubj", "cells", "prote +ct"], ["chpt38_1", "sent. 1", "fred", "nsubj", "animals", "pr +otect"], ["chpt54_1", "sent. 1", "greg", "nsubj", "uticle", "pro +tect"] ); my @sort_results = sort {lc $a->[4] cmp lc $b->[4]} @results; ##By alp +habet of arg1 my $last_word; my $current_word; my $word_count; $sort_results[-1][6] = 1; ##This weird step is b/c last element didn't + get 7th column appended for my $j (0 .. $#sort_results) { ##[ROW][COLUMN] $current_word = $sort_results[$j][4]; ## current word is arg1 of w +hichever matchset is being looked at (alphabetical) if (lc $last_word eq lc $current_word) { $word_count++; ##If seen before, increment freq. count } else { ##new word if ($j != 0) ##unless it's the first row { for (my $k = 1; $k <= $word_count; $k++) { ##make a new column with freq. Each of the previous see +n word will have to have the same freq. number so iterate back and ma +ke them all the same word count $sort_results[($j-$k)][6] = $word_count; } } ##Now set up for next iteration $last_word = $current_word; $word_count = 1; } } @sort_results = sort {$b->[6] <=> $a->[6]} @sort_results; ##Sort the r +esults by the new 7th freq. column for my $i (0 .. $#sort_results) { print "$sort_results[$i][0], $sort_results[$i][1]: "; ##chptnum, + sent num print "$sort_results[$i][2]\n\n"; ##sentence print "gramatical relation: $sort_results[$i][3]; argument: $sor +t_results[$i][4]; freq: $sort_results[$i][6]\n\n\n"; ##dependency a +rgs }

I would appreciate either a new, better way to do this (I think hashes are the way to get it done), or just an improvement on this crude code. Thanks again for all your help!

Replies are listed 'Best First'.
Re^2: How to (get started on) sort AoA or AoH by frequency
by Marshall (Canon) on Jun 13, 2011 at 20:32 UTC
    See attached code. I used the map trick again: foreach ( map{$_-> [ 4]}@results) iterates over all of the contents of column 4 and a freq hash is built. A list of references to rows is what is going into the map. The map then de-references and transforms this such that the output is list of every contents of column 4.

    The way sort works: <---output sort{...} <---input
    is that what goes in is what comes out. What is coming in are references to rows of the @results array. What sort needs is a way to compare 2 rows: row A<row B, row A equal row B or row A>rowB. The function that provides the comparison can be anything that you want as long as it produces a consistent result (reverses the answer if a and b are reversed).

    So I look up the value of col 4 for say row A, then I ask the frequency hash what the frequency is and I compare that result with a likewise computation for row B. In the case of a tie, I use an alphabetic comparison of row 0. Note that I reversed a and b to get highest frequency first while I am sorting on lowest column 0 first.

    The way that the sort decider function is written may appear a bit strange, but it is just returning a: -1, 0 or 1 depending upon how row A and row B compare.

    It is completely legal to assign the sorted result set back to the input variable and I did that. To get your printout, just do the column 4 look up in the freq hash to get frequency. The order of my @result jives with the order of your output.

    For printing, of course you can access each element as a 2-D coordinate, but usually better is to iterate over the rows with row reference like this:

    foreach my $row (@results) { print "$row->[0] $row->[1]\n"; }
    I think the following code does what you want...
    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; use Data::Dump qw(pp); my @results = (["chpt10_2", "sent. 2", "alice", "nsubj", "animals", "p +rotect"], ["chpt12_1", "sent. 54", "bob", "nsubj", "cells", "prot +ect"], ["chpt25_4", "sent. 47", "carol", "nsubj", "plants", "p +rotect"], ["chpt34_1", "sent. 1", "dave", "nsubj", "cells", "prot +ect"], ["chpt35_1", "sent. 2", "eli", "nsubj", "cells", "prote +ct"], ["chpt38_1", "sent. 1", "fred", "nsubj", "animals", "pr +otect"], ["chpt54_1", "sent. 1", "greg", "nsubj", "uticle", "pro +tect"] ); my %freq; foreach ( map{$_->[4]}@results) #feeds in list of animals, cells, utic +le, etc. { $freq{lc $_}++; } @results = sort {$freq{lc $b->[4]} <=> $freq{lc $a->[4]} #freq order or $a->[0] cmp $b->[0] #text col 0 + } @results; print pp(\@results); __END__ [ ["chpt12_1", "sent. 54", "bob", "nsubj", "cells", "protect"], ["chpt34_1", "sent. 1", "dave", "nsubj", "cells", "protect"], ["chpt35_1", "sent. 2", "eli", "nsubj", "cells", "protect"], ["chpt10_2", "sent. 2", "alice", "nsubj", "animals", "protect"], ["chpt38_1", "sent. 1", "fred", "nsubj", "animals", "protect"], ["chpt25_4", "sent. 47", "carol", "nsubj", "plants", "protect"], ["chpt54_1", "sent. 1", "greg", "nsubj", "uticle", "protect"], ]

      Marshall,

      You are awesome! This is more than I ever could have asked for, you really helped me understand sorting and the power of hashes. I will try and award you whatever I can, since you've helped me out so much (when I get this rumoured vote fairy). Thanks for your time!

      p.s. I don't think you need to lc the 1st line in sort, since it's numbers... the 2nd line would need it