Hi Monks. I tried posting a similar question earlier, but since I made a little bit (or a lot of) mess while asking it, and since three days of working on provided not a lot of progress, I thought I'd try posting again, trying to make my problem more clear.
My input is like the following:

1 beast-n into transform-v 356.9551 2 beast-n obj kill-v 266.2511 3 beast-n obj see-v 252.3623 4 beast-n prd become-v 250.9534 5 beast-n obj turn-v 224.6948 6 beast-n obj call-v 171.4000 7 beast-n sbj_intr devour-v 165.3228 8 beast-n obj hunt-v 155.7637 9 beast-n obj fight-v 150.4370 10 beast-n obj slay-v 150.3982 1 frog-n obj find-v 322.5589 2 frog-n into turn-v 307.3012 3 frog-n sbj_intr jump-v 235.0503 4 frog-n coord-1 toad-n 207.3611 5 frog-n obj see-v 207.2610 6 frog-n obj eat-v 204.1762 7 frog-n obj kill-v 64.6689

But please, take in account this is just a sample, since the actual output is more or less 4 Giga of text structured like the previous.
Also, I take in input something like this:

frog-n amphibian_reptile hyper beast-n

Even in this case, consider this is just a sample of the actual file I have.
What I need to do is a little complicated, so I hope I can explain it properly. Please, if it is not clear, ask me and I'd be glad to provide further infos.
For every entry in the second file I provided as sample, I need to check the occurrence of the first field and last field of it in the first file.
I then have to scan the first file for every line in which occur the first field of the other file (in this case, every line that has frog-n as second field) and see what the fourth field of it contains (i.e. the first entry containing frog-n, has find-v). I now have to check if find-v occurs with any entry of the same file that has beast-n (the second term of the hyper relation in my other input) as first field. In this case, find-v does not occur with it, so I have to check the following line of frog-n. Its fourth field is turn-v. I check for occurrence of it with beast-n and see that it occur with it. So, I have to compute Precision for frog-n, that is the number of found feature that occurs both with frog-n and beast-n / the rank of frog, which in this case would be 1(found feat)/2(rank of frog-n at this point). Then I need to extract also the rank of beast-n in which I found turn-v (which in this case would be 5) and the total number of occurrences of beast-n in the file (10), in order to compute a measure of reduction to apply to the just computed precision.
The measure is 1-rankfoundfv/(rankfv+1) and here, it would be 1-5/10+1=0,545454545454. So, the association measure I need would be the precision previously computed * the reduction measure I got= 1/2*0,545454545454=0,272727272727.
I have to repeat this for every entry of frog and in the end (when I rach the last entry of frog) sum all the association measure I got and divide the result for the number of occurrences of frog.

Here's my code:
#!/usr/bin/perl -w use strict; use warnings; use Getopt::Std; use Data::Dumper ; my $usage; { $usage = <<"_USAGE_"; _USAGE_ } my %opts = (); getopts('h',\%opts); if ($opts{h}) { print $usage; exit; } my $prefix = shift; my $input = shift; my $input_bless=shift; my $file = $prefix . ".txt"; if (-e $file) { print STDERR "$file already exists, deleting previous version\n"; `rm -f $file`; } #my $debug=0; #Variabile di debug. Vale 1 in fase di debug, si usa per open INPUT,$input; open OUT,">$file"; my %matrice; while (<INPUT>) { my ($rank, $nome, $relaz, $entry2, $score) = split(); push @{ $matrice{$nome} }, "${relaz}_$entry2,$rank,$score"; } #print OUT Dumper \%matrice; my %HOH; while (my($name,$aref) = each %matrice ) { for my $item (@$aref) { my($prop,$rank,$score) = split(',',$item); #push @ {$HOH{$name}{$rank}{$prop}}, "$score"; $HOH{$name}{$prop} = "${rank},$score"|| 0; } } #print Dumper \%HOH; open INPUTB,$input_bless; #my %descriptions; while (<INPUTB>) { my ($u, $superclass,$rel,$v) = (split)[0,1,2,3]; my $conteggio=&calcolo($u,$v); print OUT "$u"."\t".$rel."\t".$v."\t".$conteggio."\n"; } close INPUTB; close OUT; sub calcolo{ my ($name1, $name2)=@_; my $first = $HOH{$name1}; my $second = $HOH{$name2}; my ($rank_fv,$score_fv); my $rank_v; my $count_feat_fv; # my ($prop,$rank, $score); my $provaprec=0; my $proptoexamine; my $count_feat_rel; my $precision; my $rel_par; my $num=0; my $count_feat_fu; my $apinc; my $rank2=0; #my ($prop2,$rank2, $score2); my($prop,$score); my $rank=0; my $feature_found=0; my $feat_finale; my $rel_to_sum; while (my($name1,$aref) = each %matrice ) { $count_feat_fu++; $num=0; my $feat_rel=0; for my $item (@$aref) { ($prop,$rank, $score) = split(',',$item); # print "PROP: ".$prop."\n"; # if (exists $second->{$prop}){ #$count_feat_fv=0; # $feat_rel++; # print "FEAT REL: ".$feat_rel."\n"; # print "#####################################\n"; # print "ENTRATO\n"; #$proptoexamine=$prop; #print "PROP TO EXAMINE: ".$proptoexamine."\n"; #$count_feat_rel++; #print "RANK1: ".$rank."\n"; #$precision = $feat_rel/$rank; #print "PRECISION: ".$precision."\n"; $feat_finale=&last_el_v($name2,$prop); while (my($prop1,$rankscore1) = each %$second ){ ($rank_fv,$score_fv) = split(',',$rankscore1); if ($prop1 eq $prop){ $feature_found++; print "PROP:".$prop."\n"; print "TROVATO\n"; $feat_rel++; $rank2=$rank_fv; print "RANK 2:".$rank2."\n"; print "RANK 1: ".$rank."\n"; print "COUNT FEAT FOUND: ".$feature_found."\n"; $precision=$feature_found/$rank; print "PRECISION: ".$precision."\n"; $rel_par=$rank2/($feat_finale+1); $rel_to_sum=1-($rel_par); print "FEAT FINALE: ".$feat_finale."\n"; print "REL PAR: ".$rel_par."\n"; my $tosum= $precision*$rel_to_sum; print "TO SUM: ".$tosum."\n"; $num =$num+$tosum; print "NUM: ".$num."\n"; } #} # print "RANK 2:".$rank2."\n"; # #print "*********************************\n"; #print "RELPAR".$rel_par."\n"; # my $rel_tot=1-$rel_par; #print "REL TOT".$rel_tot."\n"; # print "*********************************\n"; # my $tosum=$precision*$rel_tot; #print "TO SUM".$tosum."\n"; #print "NUM".$num."\n"; # $num=$num+$tosum; #print "####################################\n"; # print "NUM: ".$num."\n"; # print "RANK: ".$rank."\n"; # print "APINC ".$apinc."\n"; } # $feat_finale=$count_feat_fv; #print "COUNT FEAT FV ".$count_feat_fv."\n"; } print "RANK: ".$rank."\n"; $apinc=$num/$rank; print "APINC: ".$apinc."\n"; # print "COUNT FEAT FV".$count_feat_fv."\n"; #print "APINC".$apinc."\n"; # print "COUNT FEAT U: ".$count_feat_fu."\n"; return $apinc; } # print $prop."\t".$rank."\t".$score."\n"; } sub last_el_v{ my ($name2,$prop1)=@_; #my $first = $HOH{$name1}; my $second = $HOH{$name2}; my $count_feat_fv; while (my($prop1,$rankscore1) = each %$second ){ $count_feat_fv++; } return $count_feat_fv; }

And the result I got when I run it on previous data is wrong, since it looks like it runs just over every entry of beast-n and repeats the calculations on it.
I've been spending three days trying to fix this but it looks like I can't succeed. Any ideas?
Thanks in advance
Giulia


In reply to Cycle, iterations and statistical measure got completely wrong by remluvr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.