Efficiently finding values of an extremity

Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All,
Intrepid asked in the CB for a piece of code that would satisfy the following requirements:

Identify values in a list with the minimum length
Identify the maximum string value in the resulting list
Identify a random value from the resulting list

These have been generalized a bit as hashref keys were involved, but I provided the following straightforward solution:

my @min_keys = sort { length($a) <=> length($b) || $a cmp $b } keys %$
+hashref;
my $min = length $min_keys[0];

for ( 1 .. $#min_keys ) {
    if ( length $min_keys[$_] != $min ) {
        splice @min_keys, $_;
        last;
    }
}
# maxstr & random
print join "\t", $min_keys[-1], $min_keys[ rand @min_keys ];
[download]

There was a follow on discussion about how this could be done efficiently. While some were interested in a general solution that may be incorporated into List::Util, I felt a variation of the watermark algorithm was all that was needed to satisfy all the requirements in a single pass:

my @list = qw(zero one two three four five six seven eight nine ten el
+even twelve);

my ($maxstr, $random) = find_shortest( @list );
print join "\t", $maxstr, $random;

sub find_shortest {
    my ($pos, $len, $maxstr, @min) = (0, undef, '', 0, undef);
    my %dispatch = (
        -1 => sub {
                  ($len, $pos) = (length($_), 0);
                  ($min[$pos], $maxstr) = ($_, $_);
              },
         0 => sub {
                  $min[++$pos] = $_;
                  $maxstr = $_ if $_ gt $maxstr;
              },
         1 => sub { return },
    );
    for ( @_ ) {
       if ( ! defined $len ) {
           $len = length();
           ($min[$pos], $maxstr) = ($_, $_);
           next;
       }
       $dispatch{ length($_) <=> $len }->();
    }
    return ($maxstr, (@min[0..$pos])[rand $pos]);
}
[download]

There are still ways to improve on it, but I thought it was neat and only required a minimal amount of extra effort. How could you improve (keeping the original 3 requirements in mind)?
Cheers - L~R

Comment on Efficiently finding values of an extremity Select or Download Code

Replies are listed 'Best First'.
Re: Efficiently finding values of an extremity by jdporter (Paladin) on Jul 20, 2005 at 19:13 UTC
I think ikegami's solution is fine; but I thought it would be nice if one could use the exact same comparison function that one would pass to `sort`. So that if you'd say `sort { length($a) <=> length($b) or $a cmp $b } @data;` [download] to get the entire data set in the desired order, you could simply s/sort/minima/ to get only the minima from that set: `minima { length($a) <=> length($b) or $a cmp $b } @data;` [download] Here's how: `sub minima(&@) { my $comp = shift; @_ <= 1 and return(@_); my @cur = ( $_[0] ); for my $i ( 1 .. $#_ ) { local( $a, $b ) = ( $cur[0], $_[$i] ); my $r = &$comp; if ( $r > 0 ) { @cur = ($_[$i]); } elsif ( $r == 0 ) { push @cur, $_[$i]; } # else keep current best. } @cur }` [download] However, this doesn't address the third requirement, Identify a random value from the resulting list I'm not really sure what is wanted there. Does it need to be truly random? Or is it that we simply don't care which one we get? If the latter, then one can simply take the first from the resulting list. If the former, then one could apply F-Y to the resulting list. Either way, it's outside the scope of concern of this function, I think.	[reply] [d/l] [select]
Re^2: Efficiently finding values of an extremity (WTDI) by tye (Sage) on Jul 20, 2005 at 19:29 UTC
I like that way of slicing it as well, but I think the name 'minima' fits it less well than it fits my way of slicing it. It is nice that yours only requires a single routine rather than 4 flavors. (I also like that mine uses a simpler key-generating function.) I'd call it "firsts" but I was already told that would be confusing since there is already List::Util::first. Perhaps "leaders"? - tye	[reply]
Re^3: Efficiently finding values of an extremity (WTDI) by jdporter (Paladin) on Jul 20, 2005 at 20:32 UTC
Yeah, I was originally inclined to call it "first". "firsts" is even better. But how about "optima"?	[reply]
Re: Efficiently finding values of an extremity by ikegami (Patriarch) on Jul 20, 2005 at 18:48 UTC
tye suggested that minima be added to List::Util. It looks something like this. Of course, his minima was probably terser/cleaner :) `sub minima(&@) { my $value = shift(@_); my @matches; my $min_val; { local $_ = shift(@_); $min_val = &$value(); @matches = $_; } foreach (@_) { my $val = &$value(); if ($val == $min_val) { push(@matches, $_); } elsif ($val < $min_val) { $min_val = $val; @matches = $_; } } return @matches; }` [download] If so, the original question can be solved with: `use List::Util qw( minima maxstr ); my $val = maxstr minima { length } @list;` [download] and the bonus question can be solved with: `use List::Util qw( minima ); sub pick { $_[rand(@_)] } my $val = pick minima { length } @list;` [download]	[reply] [d/l] [select]
Re^2: Efficiently finding values of an extremity (tye's) by tye (Sage) on Jul 20, 2005 at 19:12 UTC
This is what I threw on my scratchpad, plus two bug fixes: sub maxima(&@) { my $measure = shift @_; return if ! @_; my @maxima = shift @_; my $max; $max = $measure->( $_ ) for @maxima; for( @_ ) { my $key = $measure->( $_ ); if( $max < $key ) { @maxima = $_; $max = $key; } elsif( $max == $key ) { push @maxima, $_; } } return @maxima; } sub minima(&@) { my $measure = shift @_; return if ! @_; my @minima = shift @_; my $min; $min = $measure->( $_ ) for @minima; for( @_ ) { my $key = $measure->( $_ ); if( $key < $min ) { $min = $key; @minima = $_; } elsif( $key == $min ) { push @minima, $_; } } return @minima; } sub strmaxima(&@) { my $measure = shift @_; return if ! @_; my @maxima = shift @_; my $max; $max = $measure->( $_ ) for @maxima; for( @_ ) { my $key = $measure->( $_ ); if( $max lt $key ) { @maxima = $_; $max = $key; } elsif( $max eq $key ) { push @maxima, $_; } } return @maxima; } sub strminima(&@) { my $measure = shift @_; return if ! @_; my @minima = shift @_; my $min; $min = $measure->( $_ ) for @minima; for( @_ ) { my $key = $measure->( $_ ); if( $key lt $min ) { @minima = $_; $min = $key; } elsif( $key eq $min ) { push @minima, $_; } } return @minima; } [download] Which I think would be fine additions to List::Util. However, (Update: I added `return if ! @_;` to each because ) these should return an empty list not (undef) when given an empty list. - tye	[reply] [d/l] [select]
Re^3: Efficiently finding values of an extremity (tye's) by blokhead (Monsignor) on Jul 20, 2005 at 22:07 UTC
I've always known these (at least in their single-valued incarnations) as argmin & argmax, and I've even seen them used with those names in mathematical literature. I would prefer a naming scheme where the "arg-" prefix indicates that we return the index/indices which induced the extremal value(s) (and not the extremal value(s) itself/themselves), and the "-ima" suffix indicates that we return multiple values when there is a tie. It never occurred to me that a plural version of them might be useful, but I don't see why not. Perhaps the two optimized cases could even be combined efficiently by checking wantarray. Whatever you want to call it, I'm with you in that I've always wanted functions like these included in List::Util. blokhead	[reply]


Pathologically Eclectic Rubbish Lister
	PerlMonks