rolandomantilla has asked for the wisdom of the Perl Monks concerning the following question:

Hello perlmonks I'm in need of perl wisdom again. I'm trying to write a program to open a text file, and count how many words are in the file, which I was able to do. What I'm having troubles with is sorting and print a list of unique words with the number of occurances of each word, please help perl monks. This is what I have until now
#!/usr/local/bin/perl print "Please put the name of the file you want to count the words"; $file= <STDIN>; chomp $file; my $words=0; open(FILE, $file) or die $!;; while (<FILE>) { $words += scalar(split(/\s+/, $_)); } @word = split(/ /, $file); @sort= sort(@word); print ("The number of words in the file is $words\n\n"); print "@sort\n";

Replies are listed 'Best First'.
Re: Sorting Unique
by toolic (Bishop) on Aug 19, 2011 at 02:21 UTC
    One problem is that you are trying to split the filename, not words in a file. When you want "unique", think "hash".
    use warnings; use strict; use Data::Dumper; $Data::Dumper::Sortkeys = 1; my %words; my $tot = 0; while (<DATA>) { for my $word (split) { $words{$word}++; $tot++; } } print Dumper(\%words); print "The number of words in the file is $tot\n"; __DATA__ a b c d e f g h a b c
    prints...
    $VAR1 = { 'a' => 2, 'b' => 2, 'c' => 2, 'd' => 1, 'e' => 1, 'f' => 1, 'g' => 1, 'h' => 1 }; The number of words in the file is 11
Re: Sorting Unique
by jwkrahn (Abbot) on Aug 19, 2011 at 02:24 UTC
    $words += scalar(split(/\s+/, $_));

    If your text has leading whitespace then that will give you the number of words plus one.    What you should do is:

    $words += split;


    To get a count of the unique words you need to use a hash:

    my %unique_word_count; while ( <FILE> ) { my @words = split; $words += @words; $unique_word_count{ $_ }++ for @words; }
      I tryed the code and it did not worked, it was empty. This is the code:
      #!/usr/bin/perl print "Please enter the file name you want the word count\n\n"; $file= <STIDN>; chomp $file; $file=<STDIN>; chomp $file; open(FILE, $file) or die $!; my %u_wc; while (<FILE>){ my @words= split; $words += @words; $u_wc{$_}++ for @words; } print ("The number of words in the file is $words\n\n"); print ("this are the unique words $u_wc\n\n");
        Look again at lines 4-5 and 8-9. In one case, you have a spelling error...so the only thing saving you is the existence of the second pair, which would be redundant were the first caught and corrected.

        Perhaps you've posted code that is not be what you've actually written. Try to post with cut'n'paste, to avoid such problems.

        And whether or not what's shown is your actual code, use strict;, use warnings, as they will alert you to problems with your variables, such as:

        Global symbol "$file" requires explicit package name at F:\_Perl_\pl_test\921134.pl line 7. (your line 4)

        Fixing those and rerunning would have allowed strict and warnings to point out additional syntax errors (not including the grammatical error in your print statement at your line 20). When you've dealt with those, you'll be free to concentrate on the logic.

        You've already been alerted (by other Monks) to some of your logical problems; you've added to them and need to further your understanding of hashes.

        # As suggested by toolic above use strict; use warnings; print "Please enter the file name you want the word count\n\n"; my $file= <STDIN>; chomp $file; # Lexical file handles (my $fh) are better than file globs. # Also, three-argument open() is better than two-argument open(). open(my $fh, "<", $file) or die $!; my %u_wc; my $words = 0; # Don't forget to declare $word! while (<$fh>){ my @words = split; $words += @words; $u_wc{$_}++ for @words; } # print ("The number of words in the file is $words\n\n"); # The parens are useless here, and also inconsistent with the print st +atement # that prompted us for a file name. print "The number of words in the file is $words\n\n"; # print ("this are the unique words $u_wc\n\n"); # $u_wc isn't a variable. We do have %u_wc though. print "These are the unique words: ", join(" ", keys %u_wc), "\n\n";

        Output:

        G:\abyss>perl x.pl Please enter the file name you want the word count x.pl The number of words in the file is 150 These are the unique words: want you file that useless my print parens + statement better $file= suggested also unique strict; $file; name words: warnin +gs; "These with and number of do += die is %u_wc 0; to have open(my $u_wc "The % +u_wc), her e, $!; <STDIN>; # "\n\n"; Lexical isn't $words\n\n"; $file) "<", $u_wc +\n\n"); } two-argument the variable. open() a = @words toolic ("The @words; or i +n As split ; $u_wc{$_}++ name. $words\n\n"); $words for enter by prompted inconsi +stent We w ord declare %u_wc; The $fh) Don't $word! join(" are globs. forget word +s keys ("t his handles count\n\n"; use above us ", though. open(). than $fh, (<$f +h>){ (my w hile chomp Also, "Please three-argument G:\abyss>