Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Efficiently finding values of an extremity

by Limbic~Region (Chancellor)
on Jul 20, 2005 at 18:01 UTC ( [id://476580]=perlquestion: print w/replies, xml ) Need Help??

Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All,
Intrepid asked in the CB for a piece of code that would satisfy the following requirements:
  • Identify values in a list with the minimum length
  • Identify the maximum string value in the resulting list
  • Identify a random value from the resulting list

These have been generalized a bit as hashref keys were involved, but I provided the following straightforward solution:

my @min_keys = sort { length($a) <=> length($b) || $a cmp $b } keys %$ +hashref; my $min = length $min_keys[0]; for ( 1 .. $#min_keys ) { if ( length $min_keys[$_] != $min ) { splice @min_keys, $_; last; } } # maxstr & random print join "\t", $min_keys[-1], $min_keys[ rand @min_keys ];
There was a follow on discussion about how this could be done efficiently. While some were interested in a general solution that may be incorporated into List::Util, I felt a variation of the watermark algorithm was all that was needed to satisfy all the requirements in a single pass:
my @list = qw(zero one two three four five six seven eight nine ten el +even twelve); my ($maxstr, $random) = find_shortest( @list ); print join "\t", $maxstr, $random; sub find_shortest { my ($pos, $len, $maxstr, @min) = (0, undef, '', 0, undef); my %dispatch = ( -1 => sub { ($len, $pos) = (length($_), 0); ($min[$pos], $maxstr) = ($_, $_); }, 0 => sub { $min[++$pos] = $_; $maxstr = $_ if $_ gt $maxstr; }, 1 => sub { return }, ); for ( @_ ) { if ( ! defined $len ) { $len = length(); ($min[$pos], $maxstr) = ($_, $_); next; } $dispatch{ length($_) <=> $len }->(); } return ($maxstr, (@min[0..$pos])[rand $pos]); }

There are still ways to improve on it, but I thought it was neat and only required a minimal amount of extra effort. How could you improve (keeping the original 3 requirements in mind)?
Cheers - L~R

Replies are listed 'Best First'.
Re: Efficiently finding values of an extremity
by jdporter (Paladin) on Jul 20, 2005 at 19:13 UTC
    I think ikegami's solution is fine; but I thought it would be nice if one could use the exact same comparison function that one would pass to sort. So that if you'd say
    sort { length($a) <=> length($b) or $a cmp $b } @data;
    to get the entire data set in the desired order, you could simply s/sort/minima/ to get only the minima from that set:
    minima { length($a) <=> length($b) or $a cmp $b } @data;
    Here's how:
    sub minima(&@) { my $comp = shift; @_ <= 1 and return(@_); my @cur = ( $_[0] ); for my $i ( 1 .. $#_ ) { local( $a, $b ) = ( $cur[0], $_[$i] ); my $r = &$comp; if ( $r > 0 ) { @cur = ($_[$i]); } elsif ( $r == 0 ) { push @cur, $_[$i]; } # else keep current best. } @cur }

    However, this doesn't address the third requirement,
    • Identify a random value from the resulting list
    I'm not really sure what is wanted there. Does it need to be truly random? Or is it that we simply don't care which one we get? If the latter, then one can simply take the first from the resulting list. If the former, then one could apply F-Y to the resulting list. Either way, it's outside the scope of concern of this function, I think.

      I like that way of slicing it as well, but I think the name 'minima' fits it less well than it fits my way of slicing it. It is nice that yours only requires a single routine rather than 4 flavors. (I also like that mine uses a simpler key-generating function.)

      I'd call it "firsts" but I was already told that would be confusing since there is already List::Util::first. Perhaps "leaders"?

      - tye        

        Yeah, I was originally inclined to call it "first". "firsts" is even better. But how about "optima"?
Re: Efficiently finding values of an extremity
by ikegami (Patriarch) on Jul 20, 2005 at 18:48 UTC

    tye suggested that minima be added to List::Util. It looks something like this. Of course, his minima was probably terser/cleaner :)

    sub minima(&@) { my $value = shift(@_); my @matches; my $min_val; { local $_ = shift(@_); $min_val = &$value(); @matches = $_; } foreach (@_) { my $val = &$value(); if ($val == $min_val) { push(@matches, $_); } elsif ($val < $min_val) { $min_val = $val; @matches = $_; } } return @matches; }

    If so, the original question can be solved with:

    use List::Util qw( minima maxstr ); my $val = maxstr minima { length } @list;

    and the bonus question can be solved with:

    use List::Util qw( minima ); sub pick { $_[rand(@_)] } my $val = pick minima { length } @list;

      This is what I threw on my scratchpad, plus two bug fixes:

      sub maxima(&@) { my $measure = shift @_; return if ! @_; my @maxima = shift @_; my $max; $max = $measure->( $_ ) for @maxima; for( @_ ) { my $key = $measure->( $_ ); if( $max < $key ) { @maxima = $_; $max = $key; } elsif( $max == $key ) { push @maxima, $_; } } return @maxima; } sub minima(&@) { my $measure = shift @_; return if ! @_; my @minima = shift @_; my $min; $min = $measure->( $_ ) for @minima; for( @_ ) { my $key = $measure->( $_ ); if( $key < $min ) { $min = $key; @minima = $_; } elsif( $key == $min ) { push @minima, $_; } } return @minima; } sub strmaxima(&@) { my $measure = shift @_; return if ! @_; my @maxima = shift @_; my $max; $max = $measure->( $_ ) for @maxima; for( @_ ) { my $key = $measure->( $_ ); if( $max lt $key ) { @maxima = $_; $max = $key; } elsif( $max eq $key ) { push @maxima, $_; } } return @maxima; } sub strminima(&@) { my $measure = shift @_; return if ! @_; my @minima = shift @_; my $min; $min = $measure->( $_ ) for @minima; for( @_ ) { my $key = $measure->( $_ ); if( $key lt $min ) { @minima = $_; $min = $key; } elsif( $key eq $min ) { push @minima, $_; } } return @minima; }

      Which I think would be fine additions to List::Util.

      However, (Update: I added return   if  ! @_; to each because ) these should return an empty list not (undef) when given an empty list.

      - tye        

        I've always known these (at least in their single-valued incarnations) as argmin & argmax, and I've even seen them used with those names in mathematical literature. I would prefer a naming scheme where the "arg-" prefix indicates that we return the index/indices which induced the extremal value(s) (and not the extremal value(s) itself/themselves), and the "-ima" suffix indicates that we return multiple values when there is a tie.

        It never occurred to me that a plural version of them might be useful, but I don't see why not. Perhaps the two optimized cases could even be combined efficiently by checking wantarray.

        Whatever you want to call it, I'm with you in that I've always wanted functions like these included in List::Util.

        blokhead

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://476580]
Approved by dynamo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-19 06:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found