in reply to Re^2: Counting occurence of a list of word in a file
in thread Counting occurence of a list of word in a file

OK. So the core of what you have:

while (my $text=<$testo>){ for my $key (keys %hash){ my $value = $hash{$key}; my $arrkey=$key." "; my $count = 0; $count += () = /\b$key\b/ig while <>; print $conteggio "$arrkey) => $count\n"; } ; } ;
reads the file line by line into $text, and then tries each column 1 word (captured earlier as the key values in your %hash). The regex /\b$key\b/ig is plausible, and the [oi] stuff will do what you want -- the [Nn] will also work, but are redundant because of the i qualifier of the regex.

The rest is, frankly, a dogs breakfast and can be thrown away.

To count the number of times you get a match in each line,

my $count = () = $text =~ /\b$key\b/ig ;
is sufficient, but fairly deep magic. This:
my $count = 0 ; while ($text =~ /\b$key\b/ig) { $count++ ; } ;
may or may not seem clearer.

Now your problem is how to collect the count for each word across all the lines of your input. I suggest using the value part of your hash entries to hold the count for the word in the key part.

When the while loop has finished, your hash should contain the count for each word, which you can then output to $coteggio.

Replies are listed 'Best First'.
Re^4: Counting occurence of a list of word in a file
by b_vulnerability (Novice) on Nov 11, 2008 at 18:09 UTC
    thanks! It's working fine now! I'll post the code I used, so you all can give me suggestion for improving my programming skills (I'm sorry if they're not good, but I've been using Perl for just a few week, for my Thesis, and it's all really new), or maybe it could be useful for someone with the same problem..
    open my $testo, "<File_Input/Testo.txt"; open my $conteggio, ">File_Output/Conteggio.txt"; my %arrayris; while (my $text=<$testo>){ for my $key (keys %hash){ my $value = $hash{$key}; my $count = 0 ; while ($text =~ /\b$key\b/ig) { $count++ ; } ; $arrayris{$key}=$count; } } while ( my ($k,$v) = each %arrayris ) { print $conteggio "($k) => $v\n"; } close $testo; close $conteggio;

    Thanks again, to everybody!

      It's an improvement !

      If your File_Input/Testo.txt file contains more than one line, then I suggest

      $arrayris{$key} += $count ;
      will produce a more complete result. (Perl will happily create an hash entry with (effectively) a zero value when required.)

      You could also consider counting directly in your %arrayris:

      while ($text =~ /\b$key\b/ig) { $arrayris{$key}++ ; } ;

      Other things you might consider:

      • why you read the words into a hash (your %hash) when you only really use the keys... it may be preparatory to some future extension, I cannot tell.

      • similarly what is:

        my $value = $hash{$key};
        doing to justify its existence.

      • recommend use strict ; and use warnings ; -- they will help you keep out of trouble !

      • the code is definitely "quick and dirty". If your either your wordlist or your input are very long, you may want to speed things up... But, the first rule of optimisation is: Don't do it (unless you really have to).