Jannejannesson has asked for the wisdom of the Perl Monks concerning the following question:

Hey guys, I started to look at Perl about 3 days ago and I've managed to created a program that counts the word frequency in a given text file. Like so:

#!/usr/bin/local/perl use strict; use warnings; my %count; my $file_name = shift or die "Usage: perl $0 [FILE]\n"; open my $fh, '<', $file_name or die "Could not open '$file_name' $!"; while (my $line = <$fh>) { chomp $line; foreach my $word (split /\s+/, $line) { $count{$word}++; } } foreach my $word (sort keys %count) { printf "%-31s %s\n", $word, $count{$word}; }

The only thing is, is that this program shows the frequency of every word in the appeared order of the text file. What would I need to add, to be able to not only get the before mentioned result but also listing the ten most commonly used words in descending order?
Worth noting is this is my first post on here so please inform me if I'm doing anything wrong. I'm thankful for any tips or advice
Thanks in advance.

  • Comment on How do I create a list with the 10 most frequently used words in a file?
  • Download Code

Replies are listed 'Best First'.
Re: How do I create a list with the 10 most frequently used words in a file?
by afoken (Chancellor) on Apr 29, 2019 at 15:11 UTC
    foreach my $word (sort keys %count) { printf "%-31s %s\n", $word, $count{$word}; }

    The only thing is, is that this program shows the frequency of every word in the appeared order of the text file. What would I need to add, to be able to not only get the before mentioned result but also listing the ten most commonly used words in descending order?

    • Sort your hash by value, not by key. Use a coderef or a compare function for sort. See sort. sort { $count{$b} <=> $count{$a} } keys %count should do the trick.
    • Stop after writing 10 records. See last. Hint: Add a counter to the loop.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: How do I create a list with the 10 most frequently used words in a file?
by roboticus (Chancellor) on Apr 29, 2019 at 15:14 UTC

    Jannejannesson:

    Getting the "top N" or "bottom N" items of a list is often handled by sorting the keys by frequency of occurrence, and then keeping only the ones you want. You can do it like this:

    # Build a list of keys sorted (descending because $b is on the left) b +y the number of occurrences my @sorted_by_count = sort { $count{$b} <=> $count{$a} } keys %count; # Print the top 10: print "$_ occurred $count{$_} times\n" for @sorted_by_count[0 .. 9];

    Note: I didn't test the code, but it should be pretty close.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Thank you for the explanation as well as you're example. It worked great! Perlmonks is amazing!

Re: How do I create a list with the 10 most frequently used words in a file?
by holli (Abbot) on Apr 29, 2019 at 15:16 UTC
    this program shows the frequency of every word in the appeared order of the text file
    No, it doesn't. It's sorting the keywords alphabetically. It's eqivalent to
    sort { $a cmp $b } keys %count
    To sort by count use this sort block
    sort { $count{$a} <=> $count{$b} } keys %count
    For descending order reverse the sort operands
    sort { $count{$b} <=> $count{$a} } keys %count


    holli

    You can lead your users to water, but alas, you cannot drown them.

      You are correct, that's my bad, I was a bit stressed when I wrote this. I'll make sure to be clearer in any eventual posts.

Re: How do I create a list with the 10 most frequently used words in a file?
by thanos1983 (Parson) on Apr 29, 2019 at 15:14 UTC
Re: How do I create a list with the 10 most frequently used words in a file?
by NetWallah (Canon) on Apr 29, 2019 at 18:22 UTC
    Here is the obligatory "one-liner" to get the top ten word frequency:
    perl -anE "$c{$_}++ for @F}{ ++$top<11 && say qq|$c{$_}\t$_| for sort +{$c{$b}<=>$c{$a}} keys %c" YOUR-FILE.txt
    Use single-quotes instead of double, for Linux.

                    "It's ten o'clock... Do you know where your AI programs are?"

Re: How do I create a list with the 10 most frequently used words in a file?
by BillKSmith (Monsignor) on Apr 29, 2019 at 15:20 UTC
    Refer to FAQ: perldoc -q "How do I sort a hash"
    Bill