Re: Add colums and rows

It may help if you clarify what it is that you actually want. Is it the summary data only, the current table with summary data, or something else you haven't mentioned.

It's also appreciated if you indicate up front that you are seeking help with homework ("however i need to hand in perl code"), although CB comments indicate that this may not be exactly homework ("need to do this for a thesis ... no perl, perl is just a way"). Any any case a little background on what you are trying to achieve can save a lot of time all round.

Update: The following may be what you are after:

use strict;
use warnings;

my %freqs;
my $story; # Current story name

while (<DATA>) {
    if (/^\[(.*)\]\s*$/) {
        $story = ucfirst $1; # Force title caps
        die "Duplicate story title: $story" if exists $freqs{$story};
        next;
    }
    
    next unless defined $story; # wait until we have a story title

    for my $word (/\w+/g) {
        # Current story counts
        $word = ucfirst $word;
        $freqs{$word}{$story}++;
        $freqs{all}{$story}++;
        
        # Total counts
        $freqs{$word}{total}++;
        $freqs{all}{total}++;
    }
}

# Print title line
print "\t", (join "\t", sort keys %{$freqs{all}}), "\n";

# Print table
for my $word (sort keys %freqs) {
    $freqs{$word}{$_} ||= 0 for keys %{$freqs{all}};
    printf "$word\t";
    print join "\t", join "\t", map $freqs{$word}{$_}, sort keys %{$fr
+eqs{$word}};
    print "\n";
}

__DATA__
[para one]
Hello everyone! Just to make everything clear. I need this for a proje
+ct were I
am being graded. However, the project is not about Perl. It is about d
+oing
statistics with unstructured data, ie text. I can do this on excel man
+ually, but
I think it would be nice to have a code that will generalyze my analys
+is and
algorithm to any collection of texts. The final goal is to create a pr
+ocess to
classify text and information, with no human interaction. That is what
+ I being
graded on and I do not need help on that. Just perl... counting words 
+in text
and stuff like that. Here is my question Given this code

[para two]
this code takes a multiple pieces of text and creates a table with the
+ words
that appear on each story as rows and story titles as columns. Then ea
+ch "cell"
counts the number of times each word appears on each story So for exam
+ple Story
One: Perl is great Story Two: Perl is free perl Story three: Will I le
+arn perl?
will return:
[download]

Prints:

    Para one    Para two    total
A    3    2    5
About    2    0    2
Algorithm    1    0    1
Am    1    0    1
Analysis    1    0    1
And    4    2    6
Any    1    0    1
Appear    0    1    1
Appears    0    1    1
As    0    2    2
Be    1    0    1
Being    2    0    2
But    1    0    1
Can    1    0    1
Cell    0    1    1
Classify    1    0    1
Clear    1    0    1
Code    2    1    3
Collection    1    0    1
Columns    0    1    1
Counting    1    0    1
Counts    0    1    1
Create    1    0    1
Creates    0    1    1
Data    1    0    1
Do    2    0    2
Doing    1    0    1
Each    0    4    4
Everyone    1    0    1
Everything    1    0    1
Example    0    1    1
Excel    1    0    1
Final    1    0    1
For    1    1    2
Free    0    1    1
Generalyze    1    0    1
Given    1    0    1
Goal    1    0    1
Graded    2    0    2
Great    0    1    1
Have    1    0    1
Hello    1    0    1
Help    1    0    1
Here    1    0    1
However    1    0    1
Human    1    0    1
I    6    1    7
Ie    1    0    1
In    1    0    1
Information    1    0    1
Interaction    1    0    1
Is    5    2    7
It    2    0    2
Just    2    0    2
Learn    0    1    1
Like    1    0    1
Make    1    0    1
Manually    1    0    1
Multiple    0    1    1
My    2    0    2
Need    2    0    2
Nice    1    0    1
No    1    0    1
Not    2    0    2
Number    0    1    1
Of    1    2    3
On    3    2    5
One    0    1    1
Perl    2    4    6
Pieces    0    1    1
Process    1    0    1
Project    2    0    2
Question    1    0    1
Return    0    1    1
Rows    0    1    1
So    0    1    1
Statistics    1    0    1
Story    0    6    6
Stuff    1    0    1
Table    0    1    1
Takes    0    1    1
Text    3    1    4
Texts    1    0    1
That    4    1    5
The    2    2    4
Then    0    1    1
Think    1    0    1
This    3    1    4
Three    0    1    1
Times    0    1    1
Titles    0    1    1
To    5    0    5
Two    0    1    1
Unstructured    1    0    1
Were    1    0    1
What    1    0    1
Will    1    2    3
With    2    1    3
Word    0    1    1
Words    1    1    2
Would    1    0    1
all    114    63    177
[download]

Perl is environmentally friendly - it saves trees

Comment on Re: Add colums and rows Select or Download Code

Replies are listed 'Best First'.
Re^2: Add colums and rows by lechateau (Initiate) on Apr 22, 2008 at 11:51 UTC
Thank you, this gets me closer to what I need!! I guess I will need two more things.... How do I print the table to a file. I can open a file handle but I dont know how to print the entire table to a file, this far i only figured out how to print lines... (perl baby steps!) The next thing, can I modify this to count the shared words between stories, too? Thanks!	[reply]
Re^3: Add colums and rows by apl (Monsignor) on Apr 22, 2008 at 12:54 UTC
i can open a file handle but I dont know how to print the entire table to a file Let's assume you opened filehandle REPORT. The following code from Grandfather `for my $word (sort keys %freqs) { $freqs{$word}{$_} \|\|= 0 for keys %{$freqs{all}}; printf "$word\t"; print join "\t", join "\t", map $freqs{$word}{$_}, sort keys % +{$freqs{$word}}; print "\n"; }` [download] could be changed to: `for my $word (sort keys %freqs) { $freqs{$word}{$_} \|\|= 0 for keys %{$freqs{all}}; printf REPORT "$word\t"; print REPORT join "\t", join "\t", map $freqs{$word}{$_}, sort + keys %{$freqs{$word}}; print REPORT "\n"; }` [download] The next thing, can I modify this to count the shared words between stories, too? Certainly. Will you be reading several stories from the same DATA block? Then you don't need to change a thing. Will you change the program to read from several different input files? How do you want to give those filenames to your program?	[reply] [d/l] [select]
Re^4: Add colums and rows by lechateau (Initiate) on Apr 22, 2008 at 13:12 UTC
That was easy!! I will read from one nig file, just like Grandfather data sample One big file that will look like `[Story 1] blblblblbalblablablabal blblblblbalblablablabal [Story 2] blblblblbalblablablabal blblblblbalblablablabal [Story 3] blblblblbalblablablabal blblblblbalblablablabal [Story 4] blblblblbalblablablabal blblblblbalblablablabal etc...` [download]	[reply] [d/l]
Re^3: Add colums and rows by lechateau (Initiate) on Apr 23, 2008 at 15:11 UTC
Thanks everyone that helped me to get this done. I was finally able to do the shared word count on my own! Thanks again for all your time and help! And apl Trivial is a very relative term!!! I spent 20 hours to get the counting piece done. :) $filename = "tryit.txt"; open(IN, $filename) \|\| die; open(OUT, ">test1.csv") \|\| die; open(OUT1, ">test2.csv") \|\| die; my %freqs; my $story; # Current story name while (<IN>) { if (/^\<(.)\>\s$/) { $story = ucfirst $1; die "Duplicate story title: $story" if exists $freqs{$story}; next; } next unless defined $story; # wait until we have a story title s/[\.,:;\?"!\[\]\{\}(--)_]//g; for my $word (/\w+/g) { # Current story counts $word = ucfirst $word; $freqs{$word}{$story}++; $freqs{all}{$story}++; # Total counts $freqs{$word}{total}++; $freqs{all}{total}++; } } # Print title line print OUT "\t", (join "\t", sort keys %{$freqs{all}}), "\n"; # Print table for my $word (sort keys %freqs) { $freqs{$word}{$_} \|\|= 0 for keys %{$freqs{all}}; printf OUT "$word\t"; print OUT join "\t", join "\t", map $freqs{$word}{$_}, sort keys %{ +$freqs{$word}}; print OUT "\n"; } @info=sort keys %{$freqs{all}}; my @countop; for($i=0;$i<scalar(@info)-1;$i=$i+1) { $story1=$info[$i]; for($j=0;$j<scalar(@info)-1;$j=$j+1) { $story2=$info[$j]; for my $word (sort keys %freqs) { $freqs{$word}{$_} \|\|= 0 for keys %{$freqs{all}}; if($freqs{$word}{$story1} > 0){ if($freqs{$word}{$story2} > 0){ $countop[$i][$j]++;}} } } } my $m; my $n; for($m=0;$m<scalar(@info)-1;$m=$m+1) { for($n=0;$n<scalar(@info)-1;$n=$n+1) { printf OUT1 "$info[$m],$info[$n],$countop[$m][$n] \n"; } } close IN; close OUT; close OUT1; [download]	[reply] [d/l]
Re^4: Add colums and rows by GrandFather (Saint) on Apr 24, 2008 at 01:04 UTC
A few things to keep in mind while you learn Perl: Always use strictures (use strict; use warnings). Use the three parameter version of open. Use consistent indentation. Don't use C style for loops unless you really really must. Consider the following "cleaned up" version of your added code: `my @info = sort keys %{$freqs{all}}; my %countop; for my $story1 (@info) { next if $story1 eq 'total'; for my $story2 (@info) { next if $story2 eq 'total'; for my $word (sort keys %freqs) { next unless $freqs{$word}{$story1}; next unless $freqs{$word}{$story2}; $countop{$story1}{$story2}++; } } } for my $m (sort keys %countop) { for my $n (sort keys %{$countop{$m}}) { printf OUT1 "$m, $n, $countop{$m}{$n}\n"; } }` [download] Perl is environmentally friendly - it saves trees	[reply] [d/l]