comment on

Hi Monks. I tried posting a similar question earlier, but since I made a little bit (or a lot of) mess while asking it, and since three days of working on provided not a lot of progress, I thought I'd try posting again, trying to make my problem more clear.
My input is like the following:

1    beast-n    into    transform-v    356.9551
2    beast-n    obj    kill-v    266.2511
3    beast-n    obj    see-v    252.3623
4    beast-n    prd    become-v    250.9534
5    beast-n    obj    turn-v    224.6948
6    beast-n    obj    call-v    171.4000
7    beast-n    sbj_intr    devour-v    165.3228
8    beast-n    obj    hunt-v    155.7637
9    beast-n    obj    fight-v    150.4370
10    beast-n    obj    slay-v    150.3982
1    frog-n    obj    find-v    322.5589
2    frog-n    into    turn-v    307.3012
3    frog-n    sbj_intr    jump-v    235.0503
4    frog-n    coord-1    toad-n    207.3611
5    frog-n    obj    see-v    207.2610
6    frog-n    obj    eat-v    204.1762
7    frog-n    obj    kill-v    64.6689
[download]

But please, take in account this is just a sample, since the actual output is more or less 4 Giga of text structured like the previous.
Also, I take in input something like this:

frog-n    amphibian_reptile    hyper    beast-n
[download]

Even in this case, consider this is just a sample of the actual file I have.
What I need to do is a little complicated, so I hope I can explain it properly. Please, if it is not clear, ask me and I'd be glad to provide further infos.
For every entry in the second file I provided as sample, I need to check the occurrence of the first field and last field of it in the first file.
I then have to scan the first file for every line in which occur the first field of the other file (in this case, every line that has frog-n as second field) and see what the fourth field of it contains (i.e. the first entry containing frog-n, has find-v). I now have to check if find-v occurs with any entry of the same file that has beast-n (the second term of the hyper relation in my other input) as first field. In this case, find-v does not occur with it, so I have to check the following line of frog-n. Its fourth field is turn-v. I check for occurrence of it with beast-n and see that it occur with it. So, I have to compute Precision for frog-n, that is the number of found feature that occurs both with frog-n and beast-n / the rank of frog, which in this case would be 1(found feat)/2(rank of frog-n at this point). Then I need to extract also the rank of beast-n in which I found turn-v (which in this case would be 5) and the total number of occurrences of beast-n in the file (10), in order to compute a measure of reduction to apply to the just computed precision.
The measure is 1-rankfoundfv/(rankfv+1) and here, it would be 1-5/10+1=0,545454545454. So, the association measure I need would be the precision previously computed * the reduction measure I got= 1/2*0,545454545454=0,272727272727.
I have to repeat this for every entry of frog and in the end (when I rach the last entry of frog) sum all the association measure I got and divide the result for the number of occurrences of frog.

Here's my code:

#!/usr/bin/perl -w


use strict;
use warnings;
use Getopt::Std;
use Data::Dumper ;


my $usage;
{
$usage = <<"_USAGE_";
_USAGE_
}


my %opts = ();

getopts('h',\%opts);

if ($opts{h}) {
    print $usage;
    exit;
}


my $prefix = shift;
my $input = shift;
my $input_bless=shift;


my $file = $prefix . ".txt";


if (-e $file) {
    print STDERR "$file already exists, deleting previous version\n";
    `rm -f $file`;
}

#my $debug=0; #Variabile di debug. Vale 1 in fase di debug, si usa per

open INPUT,$input;

open OUT,">$file";
my %matrice;
while (<INPUT>) {
    my ($rank, $nome, $relaz, $entry2, $score) = split();
    push @{ $matrice{$nome} }, "${relaz}_$entry2,$rank,$score";
}


#print OUT Dumper  \%matrice;


my %HOH;
while (my($name,$aref) = each %matrice ) {
    for my $item (@$aref) {
        my($prop,$rank,$score) = split(',',$item);
        #push @ {$HOH{$name}{$rank}{$prop}}, "$score";
        $HOH{$name}{$prop} = "${rank},$score"|| 0;
    }
}

#print Dumper  \%HOH;

open INPUTB,$input_bless;


#my %descriptions;

while (<INPUTB>)
{
   my ($u, $superclass,$rel,$v) = (split)[0,1,2,3];
    my $conteggio=&calcolo($u,$v);
   
   print OUT "$u"."\t".$rel."\t".$v."\t".$conteggio."\n";
}
close INPUTB;
close OUT;
   
   
   
   sub calcolo{
   
        my ($name1, $name2)=@_;
    my $first  = $HOH{$name1};
    my $second = $HOH{$name2};
    my ($rank_fv,$score_fv);
    my $rank_v;
    my $count_feat_fv;
   # my ($prop,$rank, $score);
    my $provaprec=0;
    my $proptoexamine;
    my $count_feat_rel;
    my $precision;
    my $rel_par;
    my $num=0;
    my $count_feat_fu;
    my $apinc;
    my $rank2=0;
    #my ($prop2,$rank2, $score2);
  my($prop,$score);
  my $rank=0;
  my $feature_found=0;
  my $feat_finale;
  my $rel_to_sum;
   
   while (my($name1,$aref) = each %matrice ) {
    $count_feat_fu++;
    
   $num=0;
    my $feat_rel=0;
    for my $item (@$aref) {
    ($prop,$rank, $score) = split(',',$item);
    
#    print "PROP: ".$prop."\n";
    
    # if (exists $second->{$prop}){
        #$count_feat_fv=0;
        #    $feat_rel++;
         #   print "FEAT REL: ".$feat_rel."\n";
          #  print "#####################################\n";
           # print "ENTRATO\n";
            #$proptoexamine=$prop;
            #print "PROP TO EXAMINE: ".$proptoexamine."\n";
            #$count_feat_rel++;
            #print "RANK1: ".$rank."\n";
            #$precision = $feat_rel/$rank;
            #print "PRECISION: ".$precision."\n";
             $feat_finale=&last_el_v($name2,$prop);
             
        while (my($prop1,$rankscore1) = each %$second ){
            ($rank_fv,$score_fv) = split(',',$rankscore1);
        
        if ($prop1 eq $prop){
            $feature_found++;
            print "PROP:".$prop."\n";
            print "TROVATO\n";
             $feat_rel++;
            $rank2=$rank_fv;
            print "RANK 2:".$rank2."\n";
            print "RANK 1: ".$rank."\n";
            print "COUNT FEAT FOUND: ".$feature_found."\n";
            $precision=$feature_found/$rank;
             print "PRECISION: ".$precision."\n";
              $rel_par=$rank2/($feat_finale+1);
              $rel_to_sum=1-($rel_par);
              print "FEAT FINALE: ".$feat_finale."\n";
              print "REL PAR: ".$rel_par."\n";
             my $tosum= $precision*$rel_to_sum;
              print "TO SUM: ".$tosum."\n";
              $num =$num+$tosum;
               print "NUM: ".$num."\n";
        }
        
        #}
     #   print "RANK 2:".$rank2."\n";
        #
       
        #print "*********************************\n";
       
       #print "RELPAR".$rel_par."\n";
      # my $rel_tot=1-$rel_par;
      #print "REL TOT".$rel_tot."\n";
      #  print "*********************************\n";
    #  my  $tosum=$precision*$rel_tot;
     #print "TO SUM".$tosum."\n";
     #print "NUM".$num."\n";
    #  $num=$num+$tosum;
      #print "####################################\n";
       #  print "NUM: ".$num."\n";
        # print "RANK: ".$rank."\n";
        
       #  print "APINC ".$apinc."\n";
         
       }
     #   $feat_finale=$count_feat_fv;
    #print "COUNT FEAT FV ".$count_feat_fv."\n";
     
       }
    print "RANK: ".$rank."\n";
     $apinc=$num/$rank;
     
       print "APINC: ".$apinc."\n";
     # print "COUNT FEAT FV".$count_feat_fv."\n";
    #print "APINC".$apinc."\n";
     
    # print "COUNT FEAT U: ".$count_feat_fu."\n";
     
   
    return $apinc;
    }
  

    # print $prop."\t".$rank."\t".$score."\n";
    
}
   
   
   
sub last_el_v{
     my ($name2,$prop1)=@_;
    #my $first  = $HOH{$name1};
    my $second = $HOH{$name2};
    my $count_feat_fv;
    
    while (my($prop1,$rankscore1) = each %$second ){
        $count_feat_fv++;
   }
    return $count_feat_fv;
   }
[download]

And the result I got when I run it on previous data is wrong, since it looks like it runs just over every entry of beast-n and repeats the calculations on it.
I've been spending three days trying to fix this but it looks like I can't succeed. Any ideas?
Thanks in advance
Giulia

In reply to Cycle, iterations and statistical measure got completely wrong by remluvr

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.