Three comments on what you have already (which is 90% of the way there).

In your split, you can avoid getting blank matches by using \W+ instead of \W. This will match a sequence of non-word characters instead of matching each and will keep your code from getting an empty string between consecutive non-word characters in the input text.

When you increment the entry in the hash, use $frequency{$word}++; instead of $frequency{$word} = $frequency{$word} + 1 ; which gives warnings about using uninitialized values the first time any word is seen.

Since you are going through the entire hash and collecting both keys and values, you could use each instead of keys to get them both at once.

To finish this off, all you need to do is to keep track of the highest count seen while in the last loop.

#! /usr/local/bin/perl -w use strict; my $file = $ARGV[0]; open TEXT, "<$file"; my %frequency = (); while ( my $line = <TEXT> ) { my(@words) = split /\W+/, $line ; foreach my $word ( @words ) { $frequency{$word}++; } } my @most; my $cnt; while (my ($word,$freq) = each %frequency ) { if (! @most) { push @most,$word; $cnt = $freq; next; } next if $cnt > $freq; if ($cnt == $freq) { push @most,$word; } else { @most = ($word); $cnt = $freq; } } if (@most == 0) { print "No words in $file\n"; } elsif (@most == 1) { print "'$most[0]' occurred $cnt times\n"; } else { print "The following words each appeared $cnt times\n@most\n"; }

An alternative would be to perform a Schwartzian Transform.

--- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';

In reply to Re: Re: Re: How to find the most frequent in a file? by pfaut
in thread How to find the most frequent in a file? by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.