Hi everybody!
I've got a text file like this one:
1) atomo/atomo/S * senza/senza/E * nucleo/nucleo/S
2) chitarra/chitarra/S * a/a/E * corde/corda/S
3) coltello/coltello/S * dalla/da/E * lama/lama/S
4) edificio/edificio/S * ad/ad/E * facciata/facciata/S
5) biciclette/bicicletta/S * a/a/E * ruote/ruota/S
6) computer/computer/S * con/con/E * processore/processore/S
7) chiesa/chiesa/S * con/con/E * absidi/abside/S
8) opera/opera/S * con/con/E * volumi/volume/S
9) strada/strada/S * a/a/E * carreggiate/carreggiata/S
10) chitarra/chitarra/S .* a/a/E .* corde/corda/S
11) edificio/edificio/S .* con/con/E .* facciata/facciata/S
12) Codice/codice/S .* scritto/scrivere/V sulle/su/E .* lettere/le
+ttera/S
13) computer/computer/S .* basati/basare/V su/su/E .* processore/p
+rocessore/S
14) chiesa/chiesa/S .* con/con/E .* absidi/abside/S
15) opera/opera/S .* con/con/E .* volumi/volume/S
16) strada/strada/S .* a/a/E .* carreggiate/carreggiata/S
17) atomo/atomo/S .* senza/senza/E .* nucleo/nucleo/S
18) coltello/coltello/S .* dalla/da/E .* lama/lama/S
19) biciclette/bicicletta/S .* a/a/E .* ruote/ruota/S
20) coltello/coltello/S .* a/a/E .* lama/lama/S
21) codice/codice/S .* di/di/E .* lettere/lettera/S
22) biciclette/bicicletta/S .* a/a/E .* ruote/ruota/S
23) testa/testa/S .* di/di/E .* fronte/fronte/S
The first and last "unit" (by unit I mean a everything like this: word/word/TOW) are also in a text file, in which they're written down as a couple, like this:
[Nn]ucle[oi]:[Pp]roton[oi]
OCS:chip
[Ff]otosistema:LHC
N2:[aA]zoto
[Cc]enobio:[Cc]appell[ae]
[Ee]sercit[oi]:[Ll]egion[ie]
[Tt]erreno:sabbia
[Ll]attosio:[Gg]lucosio
[Cc]odic[ei]:[Ll]etter[ae]
[aA]ttinio:[Ii]sotop[oi]
[Cc]erio:[Ii]sotop[oi]
What I'd like to do is count everytime a certain relation, let's say con/con/E appear with every couple of words.
I mean, what I expect to obtain is a text file like this:
[Nn]ucle[io]-[Pp]roton[ei]-->4
where 4 is obviously the count of everytime the couple is seen with the give relation.
What I did is the following:
#!/usr/bin/perl
use strict;
use warnings;
open my $listaParole,"File_Input/Coppie_Parole.txt" or die;
my %hash;
while (my $line=<$listaParole>) {
chomp $line;
my ($word1, $word2) = split /:/, $line;
$hash{$word1} = $word2;
}
open my $input, "<Wiki_Pulito/Prova/Pattern2.txt" or die;
# Carico la parte di file di testo che va analizzata
open my $conteggio, ">Wiki_Pulito/Prova/Conteggio.txt";
# Apro il file di output
my $conto=0;
my %arrayris;
while (my $text=<$input>){
for my $key (keys %hash){
my $value = $hash{$key};
while ($text =~/(($key\/$key\/S)\s{0,2}(\.\*)\s{0,2}(con\/con\/E)\
+s{0,2}(\.\*)\s{0,2}($value\/$value\/S))/is){
$conto++;
}
my $arrkey=$key."-".$value;
$arrayris{$arrkey}=$conto;
}
}
while ( my ($k,$v) = each %arrayris ) {
print $conteggio "($k) => $v\n";
}
close $input;
close $conteggio;
but I got something wrong, since all I got is a serie of 0.
I'm sorry if I haven't explained my problem too well, but I'm italian.
Also, I've been into perl just for a little while and I'm pretty new to porgramming in general.
Thanks averyone for your help..
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.