b_vulnerability has asked for the wisdom of the Perl Monks concerning the following question:

Hi to everyone. I have a text file, which contains list of couple of words, like this:
[Nn]ucle[oi]:[Pp]roton[oi] OCS:chip [Ff]otosistema:LHC N2:[aA]zoto [Cc]enobio:[Cc]appell[ae] [Ee]sercit[oi]:[Ll]egion[ie] [Tt]erreno:sabbia [Ll]attosio:[Gg]lucosio codice:lettera

I have to know how many times each word of the first column occurs in a text file I have.
I've tried to do this:
#!/usr/bin/perl use strict; use warnings; open my $listaParole,"File_Input/Coppie_Parole.txt" or die; my %hash; while (my $line=<$listaParole>) { chomp $line; my ($word1, $word2) = split /:/, $line; $hash{$word1} = $word2; } open my $testo, "<File_Input/Testo.txt"; open my $conteggio, ">File_Output/Conteggio.txt"; my %arrayris; my $indice=0; while (my $text=<$testo>){ for my $key (keys %hash){ my $value = $hash{$key}; my $arrkey=$key." "; my $count = 0; $count += () = /\b$key\b/ig while <>; print $conteggio "$arrkey) => $count\n"; } } close $testo; close $conteggio;

but I'm not having any results!
What can I do?
Thanks for your help

Replies are listed 'Best First'.
Re: Counting occurence of a list of word in a file
by moritz (Cardinal) on Nov 11, 2008 at 16:39 UTC
    You have this structure in your code:
    while my $text=<$testo>){ ... $count += () = /\b$key\b/ig while <>; }

    The <> file handle is exhausted in the first run of the outer while loop, thus certainly not doing what you want. Why do you read from <> at all? Doesn't $text already contain an input line?

      You are right. I'll correct it and see how it turns out.
      Thanks!
      If I correct that I get this intresting output:
      Use of uninitialized value in pattern match (m//) at prova.pl line 24, + <$testo> line 1.

      I really don't understand.
        What does your "corrected" code look like?
Re: Counting occurence of a list of word in a file
by gone2015 (Deacon) on Nov 11, 2008 at 16:50 UTC
    but I'm not having any results!

    I know exactly how you feel. I keep hoping for a Sign. Nothing.

    What can I do?

    FWIW, I have been advised to seek professional help on a number of occasions.

    Now, the code you've posted looks as though it successfully reads one file into a hash, and then reads another file line by line, and does some fiddling about with each line, in some way related to the contents of the hash.

    Sadly, my mind reading skills are on the blink, and the crystal ball is at the cleaners.

    Go on, give us a clue: exactly what do you want this thing to do ? What is the relationship between the keys and the values in the hash you construct ? What is the function of the '[Nn]' etc in the word list ?

    These questions, and many more, ...

      You're totally right!
      I need to know how many times each word in the first column of the list occour in my text file. Let's say I have this two couple:
      [Aa]tom[oi]:[Nn]ucle[oi] [Nn]ucle[oi]:[Pp]roton[ei]
      and this text here:
       In particolare, l ' atomo &#8730;® composto da un nucleo carico positivamente e da un certo numero di elettroni, carichi negativamente, che gli vibrano attorno senza un ' orbita precisa (l ' elettrone si dice infatti delocalizzato), nei cosiddetti gusci elettronici. Il nucleo &#8730;® composto da protoni, che sono particelle cariche positivamente e da neutroni che sono particelle prive di carica: protoni e neutroni sono detti nucleoni. In proporzione se si considera il nucleo grande come una mela, gli elettroni gli ruotano attorno ad una distanza pari a circa un chilometro; viceversa un nucleone ha massa quasi 1800 volte superiore a quella di un elettrone. I'd like it to count how many times Aatomoi occours in this and how many times does Nnucleoi.
      The [] stuff is beacuse I have another script that search for this words in a text and I need for it to retrieve all the possible words made with this char: Nucleo, nucleo, Nuclei, nuclei.
      I hope I was more clear

        OK. So the core of what you have:

        while (my $text=<$testo>){ for my $key (keys %hash){ my $value = $hash{$key}; my $arrkey=$key." "; my $count = 0; $count += () = /\b$key\b/ig while <>; print $conteggio "$arrkey) => $count\n"; } ; } ;
        reads the file line by line into $text, and then tries each column 1 word (captured earlier as the key values in your %hash). The regex /\b$key\b/ig is plausible, and the [oi] stuff will do what you want -- the [Nn] will also work, but are redundant because of the i qualifier of the regex.

        The rest is, frankly, a dogs breakfast and can be thrown away.

        To count the number of times you get a match in each line,

        my $count = () = $text =~ /\b$key\b/ig ;
        is sufficient, but fairly deep magic. This:
        my $count = 0 ; while ($text =~ /\b$key\b/ig) { $count++ ; } ;
        may or may not seem clearer.

        Now your problem is how to collect the count for each word across all the lines of your input. I suggest using the value part of your hash entries to hold the count for the word in the key part.

        When the while loop has finished, your hash should contain the count for each word, which you can then output to $coteggio.

        If you can not figure out what some code is doing, you could break it up into smaller, isolated chunks (i.e., subs):
        use strict; use warnings; my %patts; my %counts; my $file = 'patterns.txt'; my $fh; open $fh, '<', $file or die "can not open file $file: $!"; while (<$fh>) { chomp; my ($word1, $word2) = split /:/; $patts{$word1} = $word2; } close $fh; $file = 'text.txt'; open $fh, '<', $file or die "can not open file $file: $!"; while (<$fh>) { chomp; for my $patt (keys %patts) { $counts{$patt} += count_match($_, $patt); } } close $fh; for my $patt (keys %counts) { print "pattern $patt occurs $counts{$patt} times\n"; } sub count_match { my ($str, $regex) = @_; my @words = split /\s+/, $str; my $count = 0; for (@words) { $count++ if /\b$regex\b/ } return $count; } __END__ pattern [Aa]tom[oi] occurs 1 times pattern [Nn]ucle[oi] occurs 3 times