Retrieving data with largest count

Angharad has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Retrieving data with largest count by ikegami (Patriarch) on Jun 02, 2009 at 15:08 UTC
`my %largest; while (<>) { my ($item, $group_id, $count) = split; next if $largest{$group_id} && $largest{$group_id}[2] >= $count; $largest{$group_id} = [ $item, $group_id, $count, $_ ]; } for (values(%largest)) { print($_->[-1]); }` [download] Keeps first on ties. Update: Added note at bottom.	[reply] [d/l]
Re^2: Retrieving data with largest count by Limbic~Region (Chancellor) on Jun 02, 2009 at 15:25 UTC
ikegami, I know that it is unlikely in this situation that there would be negative or even nil counts, but this code does the wrong thing in the following situation: `ASDF 1 0 ASDF 1 -3` [download] Specifically, it first sets the high water mark to 0 and then to -3. (Originally, I didn't have the group the same - corrected) Cheers - L~R	[reply] [d/l]
Re^3: Retrieving data with largest count by ikegami (Patriarch) on Jun 02, 2009 at 15:43 UTC
No it doesn't??? In: `1ct9B 3 1 1f7uA 2 3 1gaxA 2 1 1gpmA 3 5 1ihoA 6 4 1mopA 6 3 1ileA 2 5 1iq0A 2 3 1vlhB 5 3 1jhdA 5 4 ASDF 1 0 ASDF 1 -3` [download] Out: `1ihoA 6 4 ASDF 1 0 1gpmA 3 5 1ileA 2 5 1jhdA 5 4` [download] `0 >= -3`, so `next` is executed, so the record for `-3` is ignored. The item with the highest value for group 1 (`ASDF 1 0`) is printed.	[reply] [d/l] [select]
Re^4: Retrieving data with largest count by Limbic~Region (Chancellor) on Jun 02, 2009 at 17:23 UTC
Re^5: Retrieving data with largest count by ikegami (Patriarch) on Jun 02, 2009 at 17:29 UTC
Re: Retrieving data with largest count by jethro (Monsignor) on Jun 02, 2009 at 15:12 UTC
Please post your code the next time, often a few small changes suffice to make such a program work. `... my %best; while (my $line=<YOURFILE>) { chomp $line; my ($item,$group,count)= split(/\s+/,$line); if ((not exists $best{$group}) or $best{$group}{count}<$count) { $best{$group}{count}=$count; $best{$group}{item}= $item; } } #to print them out foreach my $group ( keys %best) { print "$best{$group}{item} $group $best{$group}{count}\n"; }` [download] PS: You didn't specify what should happen when two items in a group have the same count.	[reply] [d/l]
Re^2: Retrieving data with largest count by Angharad (Pilgrim) on Jun 02, 2009 at 15:15 UTC
Many thanks for the advice so far. Much appreciated. In answer to your question, if two items have the same count, choosing an item at random is fine.	[reply]
Re^3: Retrieving data with largest count by GrandFather (Saint) on Jun 04, 2009 at 20:27 UTC
"choosing an item at random is fine" and also the hardest option. All candidates in a group have to be retained until processing the input data is complete, then all groups have to be checked for multiple candidate items and a random item selected. If either 'first' or 'last' is also fine then either of those is easier to implement. True laziness is hard work	[reply]
Re: Retrieving data with largest count by Limbic~Region (Chancellor) on Jun 02, 2009 at 15:11 UTC
Angharad, What you want to use is a high water mark algorithm per item. It would look something like this: `#!/usr/bin/perl use strict; use warnings; my $input = $ARGV[0] or die "Usage: $0 <input>"; open(my $fh, '<', $input) or die "Unable to open '$input' for reading: + $!"; my %high; while (<$fh>) { chomp; my ($item, $group, $count) = split ' '; $high{$item} = $count if ! defined $high{$item} \|\| $count > $high{ +$item}; } for my $item (keys %high) { print "$item high count was $high{$item}\n"; }` [download] I have purposely left this code not working exactly as required (group is omitted) to give you something to work towards. If you need more help - remember that the entire line is stored in $_ Also, this assumes that counts will always be numeric and that they will be whitespace delimited. For something much more fancy, have a look at How A Function Becomes Higher Order Update: You didn't mention what should happen if an item has more than one group with the same highest count. The algorithm can be modified to handle the first/last/all/random values in that situation but you will need to determine what is right for you. Cheers - L~R	[reply] [d/l]
Re^2: Retrieving data with largest count by ikegami (Patriarch) on Jun 02, 2009 at 15:49 UTC
I have purposely left this code not working exactly as required (group is omitted) to give you something to work towards. Also, your code finds the highest count per item, not per group. Since each item probably only appears once, you're just printing out the input.	[reply]
Re^3: Retrieving data with largest count by Limbic~Region (Chancellor) on Jun 02, 2009 at 17:02 UTC
ikegami, Yes, (group is omitted) means I didn't consider group which is what the OP asked for. The was intentional as indicated in the same comment. I originally misread the question and when I discovered my output didn't match the OPs (before posting), I decided to leave it broken. Teach a man to fish so to speak. I guess I should have made that more clear. Cheers - L~R	[reply]
Re: Retrieving data with largest count by bichonfrise74 (Vicar) on Jun 02, 2009 at 20:22 UTC
A possible solution? `#!/usr/bin/perl use strict; my %HoH; while (<DATA>) { my ($item, $group, $count) = split; if ( defined( $HoH{$item}{$group} ) ) { $HoH{$item}{$group} = $count if ( $HoH{$item}{$group} < $count + ); } else { $HoH{$item}{$group} = $count; } } foreach my $item ( keys %HoH ) { foreach my $group ( keys %{ $HoH{$item} } ) { print "$item $group $HoH{$item}->{$group}\n"; } } __DATA__ 1ct9B 3 1 1f7uA 2 3 1gaxA 2 1 1gpmA 3 5 1ihoA 6 4 1mopA 6 3 1ileA 2 5 1iq0A 2 3 1vlhB 5 3 1jhdA 5 4` [download]	[reply] [d/l]


Do you know where your variables are?
	PerlMonks