Re: calculating the mode

This is how I would implement the approach suggested by Frankus and others in this thread (though I certainly wouldn't claim to be an experienced Perl programmer). Try this:

#! /usr/local/bin/perl -w

use strict;

# Set up a hash, where $freq{word} = no of occurences of 'word'
# (where word is actually a number in this case)

my %freq;

# Read from filename given on command line, or stdin if no file
# name is given
while (<>)
{
    my @array = split (/\s+/, $_);

    # Update the hash
    foreach (@array) {$freq{$_}++}
}

# Sort the keys of the hash (the words or numbers in the file) into 
# an array in ascending order of $freq{key} (the number of occurences)

my @sorted_array = sort { $freq{$a} <=> $freq{$b} } keys %freq;

# The mode is the last value in the array 
my $mode = pop(@sorted_array);

print "The mode is $mode\n";
[download]

Comment on Re: calculating the mode Download Code

Replies are listed 'Best First'.
Re^2: calculating the mode by particle (Vicar) on Jun 10, 2002 at 12:09 UTC
good. you can shorten your while loop a bit, though... `while (<>) { $freq{$_}++ for split; }` [download] your @array variable was merely a temp, used to glue two statements together. split defaults to a whitespace split on $_, which is what <> is filling. i've flipped around the for loop, as well. also, i wouldn't be so destructive to the sorted array. instead of popping the value, how about selecting it, by `my $mode = $sorted_array[-1]; ## aren't negative indexes neat?` [download] ~Particle accelerates	[reply] [d/l] [select]
Re: Re: calculating the mode by demerphq (Chancellor) on Jun 13, 2002 at 12:56 UTC
Hi Bilbo nice work. Particle made a couple of points that I agree with. Using modifiers when appropriate IMO produces more intuitive and straight forward code. However in this case I would say that this is a minor improvement to a non-optimal solution. (People no lectures on premature optimization please, I've heard them all before and I'm not interested in debating if this is 'premature' or not.) Keeping track fo the frequencies, and then sorting them and using only one element is wasteful. A more efficient or scalable approach would be to simply add an if to the inner loop that keeps track of the mode key and mode count for the part of list read so far. Once completeing the list this value is the correct one. use strict; # Set up a hash, where $freq{word} = no of occurences of 'word' # (where word is actually a number in this case) my %freq; # Read from filename given on command line, or stdin if no file # name is given my ($mode_count,$mode_key)=(0,undef); while (<>) { chomp; # lose newlines from the lines # Split the line by whitespace and iterate over the results foreach (split (/\s+/, $_)) { if (++$freq{$_}>$mode_count) { # increment our frequency count +er # And keep track of the most common element $mode_count=$freq{$_}; $mode_key=$_; } } } print "The mode is $mode_key with $mode_count hits\n"; [download] Incidentally for the record I havent read the thread this is in. I only read your node because you linked to it in another node... If I'm repeating something then apologies. UPDATE: Sigh. I really should have read the thread first. Now I see why you were building a list and then sorting it. Apologies. Hmm, on rereading I suppose its possible that if there were very few types of item that my apporach would actually be slower than yours (I'd have to benchmark to be sure) But i think that in the average case the sort is overkill. Yves / DeMerphq --- Writing a good benchmark isnt as easy as it might look.	[reply] [d/l]
Re: Re: calculating the mode by Anonymous Monk on Jun 10, 2002 at 12:06 UTC
thanks Bilbo - this is perfect!! you have ended a week of frustration. :-)	[reply]