If I understand rigth you want to count the number of occurrencies of every distinct "word" in your file.
To do so you should extract the "words" from the file and use a hash (not an array) to count the occurrencies. At the beginning the hash is w
empty; for every word you extract, you check if defined($hash{$word}): if it is defined then you increment the value, else you put the value to one.
Careful with that hash Eugene.
| [reply] [d/l] |
Thank you, a hash is exactly what I needed, not an array.
I saw an example online, and modified it a tad:
#!/usr/bin/perl
while (<>) {
@words = split(/\n+/);
foreach $word (@words) {
$count{$word}++;
}
}
foreach $word (sort by_count keys %count) {
print "$word \: $count{$word}\n";
}
sub by_count {
$count{$b} <=> $count{$a};
}
| [reply] [d/l] |
| [reply] [d/l] [select] |
Correction, the hash would've worked fine, but ended up using an array again.
| [reply] |
From what SkullOne said ("It's just a one-column file"), s/he had only one word per line, so the task is even easier (we need only to count the different types of line - maybe after stripping leading and trailing spaces).
The concept of using a hash would be the same, but we don't need to split the line into words.
A solution which would do with arrays instead of hashes
and is easy to code too, provided that we don't need stripping of the spaces, would be to slurp the whole file into a array, sort the array, and then (in a loop through the array), count the number of consecutive equal entries. This would get us an alphabetically sorted list of the words with associated count.
--
Ronald Fischer <ynnor@mm.st>
| [reply] |
The quick *nix solution use the sort and uniq
utilities to produce a numerically sorted list of word counts:
sort file | uniq -c | sort -n
will print out something like:
1 foo
3 bar
8 blah
| [reply] [d/l] [select] |
If you are not in a *nix environment. This is a perl only oneliner.
perl -lane '$h{$_}++ for @F; END{print qq{$h{$_} $_\n} for sort keys %
+h}' file
Switch to double quotes in Windows.
print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});
| [reply] [d/l] [select] |