comment on

Are the numbers actually in the range 1 .. 100? What is the actual range? This could have some bearing on what constitutes a good solution.

Your hash technique could be written like this (please note that we're supplying a numeric comparison to sort. Otherwise you don't get numerical order.)

use strict;
use warnings;

my @numbers = (1..100);
my %count;

$count{$_}++ foreach @numbers;

print "$_ occurs $count{$_} time(s)\n" foreach sort { $a <=> $b } keys
+ %count;
[download]

The reason I asked about the actual range is because sometimes there are patterns to the data set that might allow you to employ a less general approach, with greater efficiency. For example, if you're dealing with positive integers in some reasonably narrow range, you can avoid the O(n log n) sort, and the expense of hash lookups if you just use an array to hold your counts. In the following demonstration I did away with the output (output would swamp a benchmark). Instead, I build an array of anonymous arrays holding an integer value and a count. It's a little contrived, but the goal was to provide a similar platform upon which the two methods could be similarly compared.

use strict;
use warnings;
use Benchmark qw(cmpthese);

@main::numbers = map {int(rand(10000))+1 } 1 .. 300000;

sub hash {
    my %count;
    $count{$_}++ foreach @main::numbers;
    my @results = map { [ $_ => $count{$_} ] } sort { $a <=> $b } keys
+ %count;
}

sub array {
    my @count;
    $count[$_]++ foreach @main::numbers;
    my @results = map { [ $_ => $count[$_] ] } grep { $count[$_] } 0 .
+. $#count;
}

cmpthese ( -5, {
    hash   => \&hash,
    array  => \&array,
} );
[download]

On my system...

        Rate  hash array
hash  19.4/s    --  -55%
array 43.2/s  122%    --
[download]

Going with the less general solution wouldn't make sense if your data doesn't consist only of positive integers in a fairly narrow range. But this is an example of where a more specialized algorithm may be a better approach if the data supports it. In fact, if you are dealing with positive integers, and they do span an even greater range, as long as it fits in memory the array may still be a better approach. It doesn't scale as well for sparse data sets, but for data sets where the counts from 0 to 'n' can fit in memory, the larger sets will see even more improvement over the hash/sort technique (since a sort grows log-linearly, and the array approach grows linearly).

Dave

In reply to Re: counting elements using a hash by davido
in thread counting elements using a hash by bk1388

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.