use strict;
use warnings;
my %wordData;
my $previousWord;
while (<DATA>) {
chomp;
my @parts = split;
for my $part (@parts) {
my ($word, $punct) = (lc $part) =~ /(\w+)(.*)/;
if ($word) {
$wordData{$previousWord}{$word}++ if $previousWord;
$wordData{$word}{'!count'}++;
}
if ($word) {
$previousWord = $word;
} elsif ($punct) {
$previousWord = undef;
}
}
}
for my $word (sort keys %wordData) {
next if $wordData{$word}{'!count'} < 2;
for my $secWord (sort keys %{$wordData{$word}}) {
next if $secWord eq '!count';
printf "'%s' is followed by '%s' %.0f%% of the time\n",
$word,
$secWord,
100.0 * $wordData{$word}{$secWord} / $wordData{$word}{'!co
+unt'};
}
}
__DATA__
Thank you all for the answers... all above for the answering. I am
trying something on bigram techniques. where i have stored bigram
count in one hash and word count in other.
What i am trying to do is: From looping those hashes at once, I want
to take the value of bigram count and divide with the value of word
count.
In case of ordering i used Tie::IxHash module from CPAN. Any better so
+lution to
the way i approached my work will be appreciated... Thanks.
Prints:
'all' is followed by 'above' 50% of the time
'all' is followed by 'for' 50% of the time
'am' is followed by 'trying' 100% of the time
'and' is followed by 'divide' 50% of the time
'and' is followed by 'word' 50% of the time
'bigram' is followed by 'count' 67% of the time
'bigram' is followed by 'techniques' 33% of the time
'count' is followed by 'and' 25% of the time
'count' is followed by 'in' 75% of the time
'for' is followed by 'the' 100% of the time
'from' is followed by 'cpan' 50% of the time
'from' is followed by 'looping' 50% of the time
'i' is followed by 'am' 33% of the time
'i' is followed by 'approached' 17% of the time
'i' is followed by 'have' 17% of the time
'i' is followed by 'used' 17% of the time
'i' is followed by 'want' 17% of the time
'in' is followed by 'case' 33% of the time
'in' is followed by 'one' 33% of the time
'in' is followed by 'other' 33% of the time
'of' is followed by 'bigram' 33% of the time
'of' is followed by 'ordering' 33% of the time
'of' is followed by 'word' 33% of the time
'the' is followed by 'answering' 20% of the time
'the' is followed by 'answers' 20% of the time
'the' is followed by 'value' 40% of the time
'the' is followed by 'way' 20% of the time
'to' is followed by 'do' 33% of the time
'to' is followed by 'take' 33% of the time
'to' is followed by 'the' 33% of the time
'trying' is followed by 'something' 50% of the time
'trying' is followed by 'to' 50% of the time
'value' is followed by 'of' 100% of the time
'word' is followed by 'count' 100% of the time
Total words: 0
True laziness is hard work
|