Reporting entries in a file

SkullOne has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Reporting entries in a file by psini (Deacon) on May 30, 2008 at 22:02 UTC
If I understand rigth you want to count the number of occurrencies of every distinct "word" in your file. To do so you should extract the "words" from the file and use a hash (not an array) to count the occurrencies. At the beginning the hash is w empty; for every word you extract, you check if `defined($hash{$word})`: if it is defined then you increment the value, else you put the value to one. Careful with that hash Eugene.	[reply] [d/l]
Re^2: Reporting entries in a file by SkullOne (Acolyte) on May 30, 2008 at 22:11 UTC
Thank you, a hash is exactly what I needed, not an array. I saw an example online, and modified it a tad: `#!/usr/bin/perl while (<>) { @words = split(/\n+/); foreach $word (@words) { $count{$word}++; } } foreach $word (sort by_count keys %count) { print "$word \: $count{$word}\n"; } sub by_count { $count{$b} <=> $count{$a}; }` [download]	[reply] [d/l]
Re^3: Reporting entries in a file by alexm (Chaplain) on May 30, 2008 at 22:26 UTC
`@words = split(/\n+/);` I guess you meant: `@words = split(/\s+/);`	[reply] [d/l] [select]
Re^3: Reporting entries in a file by SkullOne (Acolyte) on May 30, 2008 at 22:13 UTC
Correction, the hash would've worked fine, but ended up using an array again.	[reply]
Re^2: Reporting entries in a file by rovf (Priest) on Jun 02, 2008 at 09:00 UTC
From what SkullOne said ("It's just a one-column file"), s/he had only one word per line, so the task is even easier (we need only to count the different types of line - maybe after stripping leading and trailing spaces). The concept of using a hash would be the same, but we don't need to split the line into words. A solution which would do with arrays instead of hashes and is easy to code too, provided that we don't need stripping of the spaces, would be to slurp the whole file into a array, sort the array, and then (in a loop through the array), count the number of consecutive equal entries. This would get us an alphabetically sorted list of the words with associated count. -- Ronald Fischer <ynnor@mm.st>	[reply]
Re: Reporting entries in a file (unix one-liner) by toolic (Bishop) on May 31, 2008 at 01:49 UTC
The quick nix solution use the sort* and uniq utilities to produce a numerically sorted list of word counts: `sort file \| uniq -c \| sort -n` [download] will print out something like: `1 foo 3 bar 8 blah` [download]	[reply] [d/l] [select]
Re^2: Reporting entries in a file (unix one-liner) by codeacrobat (Chaplain) on May 31, 2008 at 22:08 UTC
If you are not in a *nix environment. This is a perl only oneliner. `perl -lane '$h{$_}++ for @F; END{print qq{$h{$_} $_\n} for sort keys % +h}' file` [download] Switch to double quotes in Windows. `print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});`	[reply] [d/l] [select]