comment on

You raise a good point that sorting is not the optimal (i.e. does the least work for the most gain) method for solving this problem. If I were given a sufficiently large hash (and I wanted to extract every last ounce of perfomance) I might code the highest value algorithm using something like this:

my %hashName = ( "term1" => 83, "term2" => 20, "term3" => 193 );
my $max;
while ((undef, my $val) = each %hashName) {
    $max ||= $val;
    $max = $val if $val >= $max;
}
[download]

The each ensures that you don't read too much data into memory at once (one key/value pair), and ensures that we only enumerate the data once. This would also work well with tied data (e.g. DB_File) as it would only scan one record at a time.

However, during my testing, unless you are searching BIG sets of data for the maximum, the sort method is actually much more efficient. As always, selecting the right algorithm relies on both knowing the correct algorithms (their big-O notation and so forth) AND knowing your data. For this problem, if you know your hashes are always small, use sort. You are unlikely to be able to code a pure Perl algorithm which actually executes faster. If your hashes are huge (say over 20000 keys), use the while loop and each. The Benchmarks below illustrate my point:

#!/usr/bin/perl

use Benchmark qw(timethese cmpthese);

print "SMALL HASH SIZES:-\n\n";

run_timings(
    150_000,             # Iterations
    10, 100, 1000        # Hash Sizes
);

print "LARGE HASH SIZES:-\n\n";

run_timings(
    50,                  # Iterations
    10_000, 50_000       # Hash Sizes
);

print "HUGE HASH SIZES:-\n\n";

run_timings(
    5,                   # Iterations
    100_000, 1_000_000   # Hash Sizes
);

exit;

sub run_timings {
    my $iterations = shift;
    my @hash_sizes = @_;
    
    for my $n (@hash_sizes) {
    
        my %hashName = map { $_ => $_ } 1 .. $n;
    
        print "------------------------------------------\n";
        print "Hash size: ", scalar(keys(%hashName)), "\n";
        print "Max Value: ", (sort { $b <=> $a } values %hashName)[0],
+ "\n";

        print "Timings:\n\n";
        my $r = timethese($iterations, {
            'new' => sub {
                my ($max) = sort { $b <=> $a } values %hashName;
            },
            'nosort' => sub {
                my $max, $val;
                while ((undef, $val) = each %hashName) {
                    $max ||= $val;
                    $max = $val if $val >= $max;
                }
            },
        });

        print "\nComparasion:\n\n";
        cmpthese($r);

        print "\n\n";
    }
}

__END__

Timing results follow:-
[download]

SMALL HASH SIZES:-

------------------------------------------
Hash size: 10
Max Value: 10
Timings:

Benchmark: timing 150000 iterations of new, nosort...
       new:  1 wallclock secs ( 0.56 usr +  0.00 sys =  0.56 CPU) @ 26
+6429.84/s (n=150000)
    nosort:  2 wallclock secs ( 1.77 usr +  0.00 sys =  1.77 CPU) @ 84
+937.71/s (n=150000)

Comparasion:

           Rate nosort    new
nosort  84938/s     --   -68%
new    266430/s   214%     --


------------------------------------------
Hash size: 100
Max Value: 100
Timings:

Benchmark: timing 150000 iterations of new, nosort...
       new:  6 wallclock secs ( 5.67 usr +  0.00 sys =  5.67 CPU) @ 26
+450.36/s (n=150000)
    nosort: 15 wallclock secs (15.06 usr +  0.00 sys = 15.06 CPU) @ 99
+58.18/s (n=150000)

Comparasion:

          Rate nosort    new
nosort  9958/s     --   -62%
new    26450/s   166%     --


------------------------------------------
Hash size: 1000
Max Value: 1000
Timings:

Benchmark: timing 150000 iterations of new, nosort...
       new: 82 wallclock secs (80.11 usr +  0.00 sys = 80.11 CPU) @ 18
+72.45/s (n=150000)
    nosort: 157 wallclock secs (154.36 usr +  0.03 sys = 154.39 CPU) @
+ 971.57/s (n=150000)

Comparasion:

         Rate nosort    new
nosort  972/s     --   -48%
new    1872/s    93%     --


LARGE HASH SIZES:-

------------------------------------------
Hash size: 10000
Max Value: 10000
Timings:

Benchmark: timing 50 iterations of new, nosort...
       new:  1 wallclock secs ( 0.53 usr +  0.00 sys =  0.53 CPU) @ 94
+.16/s (n=50)
    nosort:  1 wallclock secs ( 0.95 usr +  0.00 sys =  0.95 CPU) @ 52
+.47/s (n=50)

Comparasion:

         Rate nosort    new
nosort 52.5/s     --   -44%
new    94.2/s    79%     --


------------------------------------------
Hash size: 50000
Max Value: 50000
Timings:

Benchmark: timing 50 iterations of new, nosort...
       new:  7 wallclock secs ( 6.41 usr +  0.00 sys =  6.41 CPU) @  7
+.81/s (n=50)
    nosort:  5 wallclock secs ( 5.00 usr +  0.02 sys =  5.02 CPU) @  9
+.97/s (n=50)

Comparasion:

         Rate    new nosort
new    7.81/s     --   -22%
nosort 9.97/s    28%     --


HUGE HASH SIZES:-

------------------------------------------
Hash size: 100000
Max Value: 100000
Timings:

Benchmark: timing 5 iterations of new, nosort...
       new:  1 wallclock secs ( 1.61 usr +  0.00 sys =  1.61 CPU) @  3
+.11/s (n=5)
    nosort:  2 wallclock secs ( 1.06 usr +  0.00 sys =  1.06 CPU) @  4
+.70/s (n=5)

Comparasion:

         Rate    new nosort
new    3.11/s     --   -34%
nosort 4.70/s    51%     --


------------------------------------------
Hash size: 1000000
Max Value: 1000000
Timings:

Benchmark: timing 5 iterations of new, nosort...
       new: 26 wallclock secs (25.59 usr +  0.06 sys = 25.66 CPU) @  0
+.19/s (n=5)
    nosort: 11 wallclock secs (11.22 usr +  0.02 sys = 11.23 CPU) @  0
+.45/s (n=5)

Comparasion:

       s/iter    new nosort
new      5.13     --   -56%
nosort   2.25   128%     --
[download]

I hope that this helps. In general though, if you are looking to improve performance by optimising simple algorithms like this, then you are looking in the wrong place! ;-) For the bulk of common problems, you will find much bigger performance gains by optimising any file and/or network I/O than you will ever gain by chopping 20% off a loop which gets executed once or twice in your script.

Cheers,

-- Dave :-)

$q=[split+qr,,,q,~swmi,.$,],+s.$.Em~w^,,.,s,.,$&&$$q[pos],eg,print

In reply to Re: Re: Re: max value in a hash by DaveH
in thread max value in a hash by rsiedl

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.