in reply to Add colums and rows

It may help if you clarify what it is that you actually want. Is it the summary data only, the current table with summary data, or something else you haven't mentioned.

It's also appreciated if you indicate up front that you are seeking help with homework ("however i need to hand in perl code"), although CB comments indicate that this may not be exactly homework ("need to do this for a thesis ... no perl, perl is just a way"). Any any case a little background on what you are trying to achieve can save a lot of time all round.

Update: The following may be what you are after:

use strict; use warnings; my %freqs; my $story; # Current story name while (<DATA>) { if (/^\[(.*)\]\s*$/) { $story = ucfirst $1; # Force title caps die "Duplicate story title: $story" if exists $freqs{$story}; next; } next unless defined $story; # wait until we have a story title for my $word (/\w+/g) { # Current story counts $word = ucfirst $word; $freqs{$word}{$story}++; $freqs{all}{$story}++; # Total counts $freqs{$word}{total}++; $freqs{all}{total}++; } } # Print title line print "\t", (join "\t", sort keys %{$freqs{all}}), "\n"; # Print table for my $word (sort keys %freqs) { $freqs{$word}{$_} ||= 0 for keys %{$freqs{all}}; printf "$word\t"; print join "\t", join "\t", map $freqs{$word}{$_}, sort keys %{$fr +eqs{$word}}; print "\n"; } __DATA__ [para one] Hello everyone! Just to make everything clear. I need this for a proje +ct were I am being graded. However, the project is not about Perl. It is about d +oing statistics with unstructured data, ie text. I can do this on excel man +ually, but I think it would be nice to have a code that will generalyze my analys +is and algorithm to any collection of texts. The final goal is to create a pr +ocess to classify text and information, with no human interaction. That is what + I being graded on and I do not need help on that. Just perl... counting words +in text and stuff like that. Here is my question Given this code [para two] this code takes a multiple pieces of text and creates a table with the + words that appear on each story as rows and story titles as columns. Then ea +ch "cell" counts the number of times each word appears on each story So for exam +ple Story One: Perl is great Story Two: Perl is free perl Story three: Will I le +arn perl? will return:

Prints:

Para one Para two total A 3 2 5 About 2 0 2 Algorithm 1 0 1 Am 1 0 1 Analysis 1 0 1 And 4 2 6 Any 1 0 1 Appear 0 1 1 Appears 0 1 1 As 0 2 2 Be 1 0 1 Being 2 0 2 But 1 0 1 Can 1 0 1 Cell 0 1 1 Classify 1 0 1 Clear 1 0 1 Code 2 1 3 Collection 1 0 1 Columns 0 1 1 Counting 1 0 1 Counts 0 1 1 Create 1 0 1 Creates 0 1 1 Data 1 0 1 Do 2 0 2 Doing 1 0 1 Each 0 4 4 Everyone 1 0 1 Everything 1 0 1 Example 0 1 1 Excel 1 0 1 Final 1 0 1 For 1 1 2 Free 0 1 1 Generalyze 1 0 1 Given 1 0 1 Goal 1 0 1 Graded 2 0 2 Great 0 1 1 Have 1 0 1 Hello 1 0 1 Help 1 0 1 Here 1 0 1 However 1 0 1 Human 1 0 1 I 6 1 7 Ie 1 0 1 In 1 0 1 Information 1 0 1 Interaction 1 0 1 Is 5 2 7 It 2 0 2 Just 2 0 2 Learn 0 1 1 Like 1 0 1 Make 1 0 1 Manually 1 0 1 Multiple 0 1 1 My 2 0 2 Need 2 0 2 Nice 1 0 1 No 1 0 1 Not 2 0 2 Number 0 1 1 Of 1 2 3 On 3 2 5 One 0 1 1 Perl 2 4 6 Pieces 0 1 1 Process 1 0 1 Project 2 0 2 Question 1 0 1 Return 0 1 1 Rows 0 1 1 So 0 1 1 Statistics 1 0 1 Story 0 6 6 Stuff 1 0 1 Table 0 1 1 Takes 0 1 1 Text 3 1 4 Texts 1 0 1 That 4 1 5 The 2 2 4 Then 0 1 1 Think 1 0 1 This 3 1 4 Three 0 1 1 Times 0 1 1 Titles 0 1 1 To 5 0 5 Two 0 1 1 Unstructured 1 0 1 Were 1 0 1 What 1 0 1 Will 1 2 3 With 2 1 3 Word 0 1 1 Words 1 1 2 Would 1 0 1 all 114 63 177

Perl is environmentally friendly - it saves trees

Replies are listed 'Best First'.
Re^2: Add colums and rows
by lechateau (Initiate) on Apr 22, 2008 at 11:51 UTC
    Thank you, this gets me closer to what I need!! I guess I will need two more things.... How do I print the table to a file. I can open a file handle but I dont know how to print the entire table to a file, this far i only figured out how to print lines... (perl baby steps!) The next thing, can I modify this to count the shared words between stories, too? Thanks!
      i can open a file handle but I dont know how to print the entire table to a file

      Let's assume you opened filehandle REPORT. The following code from Grandfather

      for my $word (sort keys %freqs) { $freqs{$word}{$_} ||= 0 for keys %{$freqs{all}}; printf "$word\t"; print join "\t", join "\t", map $freqs{$word}{$_}, sort keys % +{$freqs{$word}}; print "\n"; }

      could be changed to:

      for my $word (sort keys %freqs) { $freqs{$word}{$_} ||= 0 for keys %{$freqs{all}}; printf REPORT "$word\t"; print REPORT join "\t", join "\t", map $freqs{$word}{$_}, sort + keys %{$freqs{$word}}; print REPORT "\n"; }
      The next thing, can I modify this to count the shared words between stories, too?
      Certainly. Will you be reading several stories from the same DATA block? Then you don't need to change a thing.

      Will you change the program to read from several different input files? How do you want to give those filenames to your program?

        That was easy!! I will read from one nig file, just like Grandfather data sample One big file that will look like
        [Story 1] blblblblbalblablablabal blblblblbalblablablabal [Story 2] blblblblbalblablablabal blblblblbalblablablabal [Story 3] blblblblbalblablablabal blblblblbalblablablabal [Story 4] blblblblbalblablablabal blblblblbalblablablabal etc...
      Thanks everyone that helped me to get this done. I was finally able to do the shared word count on my own! Thanks again for all your time and help! And apl Trivial is a very relative term!!! I spent 20 hours to get the counting piece done. :)
      $filename = "tryit.txt"; open(IN, $filename) || die; open(OUT, ">test1.csv") || die; open(OUT1, ">test2.csv") || die; my %freqs; my $story; # Current story name while (<IN>) { if (/^\<(.*)\>\s*$/) { $story = ucfirst $1; die "Duplicate story title: $story" if exists $freqs{$story}; next; } next unless defined $story; # wait until we have a story title s/[\.,:;\?"!\(\)\[\]\{\}(--)_]//g; for my $word (/\w+/g) { # Current story counts $word = ucfirst $word; $freqs{$word}{$story}++; $freqs{all}{$story}++; # Total counts $freqs{$word}{total}++; $freqs{all}{total}++; } } # Print title line print OUT "\t", (join "\t", sort keys %{$freqs{all}}), "\n"; # Print table for my $word (sort keys %freqs) { $freqs{$word}{$_} ||= 0 for keys %{$freqs{all}}; printf OUT "$word\t"; print OUT join "\t", join "\t", map $freqs{$word}{$_}, sort keys %{ +$freqs{$word}}; print OUT "\n"; } @info=sort keys %{$freqs{all}}; my @countop; for($i=0;$i<scalar(@info)-1;$i=$i+1) { $story1=$info[$i]; for($j=0;$j<scalar(@info)-1;$j=$j+1) { $story2=$info[$j]; for my $word (sort keys %freqs) { $freqs{$word}{$_} ||= 0 for keys %{$freqs{all}}; if($freqs{$word}{$story1} > 0){ if($freqs{$word}{$story2} > 0){ $countop[$i][$j]++;}} } } } my $m; my $n; for($m=0;$m<scalar(@info)-1;$m=$m+1) { for($n=0;$n<scalar(@info)-1;$n=$n+1) { printf OUT1 "$info[$m],$info[$n],$countop[$m][$n] \n"; } } close IN; close OUT; close OUT1;

        A few things to keep in mind while you learn Perl:

        • Always use strictures (use strict; use warnings).
        • Use the three parameter version of open.
        • Use consistent indentation.
        • Don't use C style for loops unless you really really must.

        Consider the following "cleaned up" version of your added code:

        my @info = sort keys %{$freqs{all}}; my %countop; for my $story1 (@info) { next if $story1 eq 'total'; for my $story2 (@info) { next if $story2 eq 'total'; for my $word (sort keys %freqs) { next unless $freqs{$word}{$story1}; next unless $freqs{$word}{$story2}; $countop{$story1}{$story2}++; } } } for my $m (sort keys %countop) { for my $n (sort keys %{$countop{$m}}) { printf OUT1 "$m, $n, $countop{$m}{$n}\n"; } }

        Perl is environmentally friendly - it saves trees