comment on

  my @n; $n[$_{$_}] = $_ for map{$_{$_}++; $_} @list;

  print "Most frequent: $n[-1]";
[download]

That is a nice little snippet.

I hope he'll forgive me for pushing this one step further with this sub which I have added to my personal utilities module.

sub most_frequent{ local *_=*_; $_[$_{$_}] = $_ for map{$_{$_}++; $_} 
+@_; $_[-1]; }
[download]

I don't think that does what you want it to do. It only returns the most frequent element if the frequency is greater than or equal to the last index of the array. For instance, if pass that function the list qw( 1 1 2 3 );, $_[2] is set to 1, and $_[1] is set to 2 then 3. But $_[3] remains 3, and your code will return it.

Which goes along way to providing, and could easliy be extended to provide most if not all of the function available in the Statistics::Frequency module I saw mentioned, without the overhead of the 50 or so lines of inefficient and frankly rather pedestrian code that make it up.

This inefficient and pedestrian code you speak of is much more efficient than the broken code you posted. First I tried a one element list so your code couldn't break. Then I disregarded the fact that it breaks, and tried a slightly larger list. The results look good for Statistics::Frequency.


#!/usr/bin/perl

use warnings;
use strict;
$|++;

use Statistics::Frequency;
use Benchmark qw( cmpthese );

my @data_small  = qw( bob );

my @data_bigger = qw( bob bob bob tom sally jim bob bob bob tom sally 
+jim
                      bob bob bob tom sally jim bob bob bob tom sally 
+jim
                      bob bob bob tom sally jim bob bob bob tom sally 
+jim
                      bob bob bob tom sally jim bob bob bob tom sally 
+jim
                      bob bob bob tom sally jim bob bob bob tom sally 
+jim
                      bob bob bob tom sally jim bob bob bob tom sally 
+jim
                      bob bob bob tom sally jim bob bob bob tom sally 
+jim
                      bob bob bob tom sally jim bob bob bob tom sally 
+jim
                    );

cmpthese( 10_000, {
                   mf_small  => \&mf_small,
                   sf_small  => \&sf_small,
                  }
        );

cmpthese( 2500, {
                 mf_bigger => \&mf_bigger,
                 sf_bigger => \&sf_bigger,
                }
        );

sub sf_small {
  my $f = Statistics::Frequency->new( @data_small );
  my %f = reverse $f->frequencies;
  die "sf broken" unless $f{$f->frequencies_max} eq 'bob';
}

sub sf_bigger {
  my $f = Statistics::Frequency->new( @data_bigger );
  my %f = reverse $f->frequencies;
  die "sf broken" unless $f{$f->frequencies_max} eq 'bob';
}

sub mf_small {
  my $f = most_frequent( @data_small );
  die "mf broken" unless $f eq 'bob';
}

sub mf_bigger {
  my $f = most_frequent( @data_bigger );
  #die "mf broken" unless $f eq 'bob';
}

sub most_frequent{
  local *_=*_; $_[$_{$_}] = $_ for map{$_{$_}++; $_} @_; $_[-1];
}



Benchmark: timing 10000 iterations of mf_small, sf_small...
  mf_small:  4 wallclock secs ( 2.56 usr +  0.54 sys =  3.10 CPU) @ 32
+25.81/s (
n=10000)
  sf_small:  1 wallclock secs ( 0.71 usr +  0.13 sys =  0.84 CPU) @ 11
+904.76/s ( n=10000)
              Rate mf_small sf_small
              mf_small  3226/s       --     -73%
              sf_small 11905/s     269%       --
Benchmark: timing 2500 iterations of mf_bigger, sf_bigger...
  mf_bigger: 23 wallclock secs (12.17 usr + 10.49 sys = 22.66 CPU) @ 1
+10.33/s (n= 2500)
  sf_bigger:  1 wallclock secs ( 1.11 usr +  0.14 sys =  1.25 CPU) @ 2
+000.00/s (n =2500)
              Rate mf_bigger sf_bigger
              mf_bigger  110/s        --      -94%
              sf_bigger 2000/s     1713%        --
[download]

I find it incredulous that the author implemented a complete function and a nested loop to determine the "sum of the frequencies", which unless I am just too tired, amounts to the size of the list or array?

I assume that you mean that you are incredulous, or that you find it incredible.

Just goes to show that you have to read the source before blythly accepting the merit of any given module. Just being a part of CPAN isn't of itself enough to ensure any sort of quality.

Yeah, especially if they are written by the CPAN's master librarian and co-author of Mastering Algorithms with Perl. In fact, I think I should distrust anything released by that author. Guess I better go downgrade my Perl.

-- dug

In reply to Re: Re: •Re: Most frequent element in an array. by dug
in thread Most frequent element in an array. by BrowserUk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.