I am a reading a text file (Alice in Wonderland), parsing each line in a primitive way, trying to print the most common "word" using a hash.
I could do it by iterating through the keys, but instead I am trying to keep track each time I catch a word.
My program prints as shown below.
("ALICE'S" is the first word in the file; I would expect the screen to be full of the rest of the words too.)
C:\scripts>wordcount.pl alice.txt
ALICE'Sdistinct words: 4007
frequency of most common word: 5160
common word:
use strict;
my $maxcount;
my $find;
sub read_line {
our %hash;
my @list = split /[,.?!)"]?\s\)?/, shift;
my $count;
foreach my $word (@list) {
$hash{lc $word}++;
$count =$hash{lc $word};
if ($count > $maxcount) {
print "$word";
$maxcount++;
$find = $word;
}
}
}
sub read_file{
my $file=shift;
open (FILE, $file) or die "couldn't open $file: $!";
while (my $line = <FILE>) {
read_line $line;
}
}
read_file @ARGV;
my $numwords= keys our %hash;
print "distinct words: $numwords\n";
print "frequency of most common word: $maxcount\n";
print "common word: $find";
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.