lechateau has asked for the wisdom of the Perl Monks concerning the following question:
this code takes a multiple pieces of text and creates a table with the words that appear on each story as rows and story titles as columns. Then each "cell" counts the number of times each word appears on each story So for example Story One: Perl is great Story Two: Perl is free perl Story three: Will I learn perl? will return:#! perl -w $filename = "tryit.txt"; open(IN, $filename) || die; my %freq; my @title; # array of titles my $story; # number of current story while(<IN>) { if(/^\<(.*)\>\s*$/) { # It's a title push @title, $1; $story = $#title; } elsif (defined $story) { # It's plain text s/[\.,:;\?"!\(\)\[\]\{\}(--)_]//g; foreach my $word (/\w+/g) { $freq{lc $word}[$story]++; } } } # print "\n\nOutput tab delimited text file:\n\n"; { local($\, $,) = ("\n", "\t"); print '', @title; foreach my $row (sort keys %freq) { print $row, map $_ || '', @{$freq{$row}}[0 .. @title-1] } } close IN
NOw in order to do what I need to accomplish my final task i need to sum rows, that is for example: how many times does the word perl appears on the stories? then I need to sum colums, how may words does story one have?, And finally I need to find out how many words do stories 1 and 2 or 3 have in common. I know I could take the output and do this on excel, however i need to hand in perl code.... Thank you!!story1 Story2 Story3 Perl 1 2 1 is 1 1 great1 free 1 will 1 i 1 learn 1
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Add colums and rows
by apl (Monsignor) on Apr 22, 2008 at 01:10 UTC | |
by lechateau (Initiate) on Apr 22, 2008 at 01:43 UTC | |
|
Re: Add colums and rows
by tachyon-II (Chaplain) on Apr 22, 2008 at 03:51 UTC | |
|
Re: Add colums and rows
by GrandFather (Saint) on Apr 22, 2008 at 02:40 UTC | |
by lechateau (Initiate) on Apr 22, 2008 at 11:51 UTC | |
by apl (Monsignor) on Apr 22, 2008 at 12:54 UTC | |
by lechateau (Initiate) on Apr 22, 2008 at 13:12 UTC | |
by lechateau (Initiate) on Apr 23, 2008 at 15:11 UTC | |
by GrandFather (Saint) on Apr 24, 2008 at 01:04 UTC |