remluvr has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks! It looks like today is my day in having problems I cannot resolve. It looks like two years without coding + problems in understanding complex structure might generate a lot of problems! So, here's my problem. I have this kind of input:
frog-n as novelty-n 5.8504 frog-n be yellow-n 6.1961 frog-n be-1 Asia-n 5.0937 frog-n coord zebra-n 5.9279 frog-n coord-1 Canuck-n 6.3363 frog-n nmod-1 mule-n 4.2881 amphibian-n success-1 surprising-j 14.6340 amphibian-n such_as alligator-n 11.5265 amphibian-n than work-n 5.9948 amphibian-n though stalk-n 13.2228
and my output should be a "matrix", as to say, made like the following:
frog-n as_novelty-n,5.8504 be_yellow-n,6.1961 be-1_Asia-n,5.0937 + coord_zebra-n,5.9279 coord-1_Canuck-n,6.3363 nmod-1_mule-n, +4.2881 amphibian-n success-1_surprising-j,14.6340 such_as_alligator-n,1 +1.5265 than_work-n,5.9948 though_stalk-n,13.2228
basically, the first element contained in the first column of the input file is the key and a joint expression between the element contained in the 2nd and 3rd column, with the corresponding score
I managed to do the following:
my $prefix = shift; my $input = shift; my $file = $prefix . ".txt"; if (-e $file) { print STDERR "$file already exists, deleting previous version\n"; `rm -f $file`; } my $debug=0; #Variabile di debug. Vale 1 in fase di debug, si usa per my %seen = (); my @global_els = (); my @row_els = (); my %score_of = (); my $row_el; my $gram; my $col_el; my $score_of; my $score; my $global_el; open INPUT,$input; while(<INPUT>){ chomp; ($row_el,$gram,$col_el,$score) = split "[\t ]+",$_; $global_el=$gram."_".$col_el; if (!($seen{"glob"}{$global_el}++)) { push @global_els,$global_el; } if (!$seen{"row"}{$row_el}++) { push @row_els,$row_el; } $score_of{$row_el}{$global_el} = $score; if($debug){ print "Check:".$row_el."=>".$global_el."=>".$score; } } close INPUT; #@global_els = (); #@row_els = (); open MATRIX,">$file"; #my $score_b=$score_of{$row_el}{$global_el}; foreach $row_el (@row_els) { print MATRIX "\t",$row_el; foreach $global_el (@global_els) { print MATRIX "\t",$global_el; print MATRIX ",",$score_of{$row_el}{$global_el}; } print MATRIX "\n"; } close MATRIX;
But my output is wrong, since all the so-called joined elements appear in both the lines, even if they are not related to the element in that line. For example, the output I get using the data above is like:
frog-n as_novelty-n,5.8504 be_yellow-n,6.1961 be-1_Asia-n,5.0937 + coord_zebra-n,5.9279 coord-1_Canuck-n,6.3363 nmod-1_mule-n, +4.2881 success-1_surprising-j, such_as_alligator-n, than_wor +k-n, though_stalk-n, amphibian-n success-1_surprising-j,14.6340 such_as_alligator-n,1 +1.5265 than_work-n,5.9948 though_stalk-n,13.2228 as_novelty +-n, be_yellow-n, be-1_Asia-n, coord_zebra-n, coord-1_Canu +ck-n, nmod-1_mule-n,
What did I get wrong? How can I improve it? Thanks everyone, Giulia
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Problems with complex structure and array of arrays
by Eliya (Vicar) on Mar 01, 2012 at 22:24 UTC | |
|
Re: Problems with complex structure and array of arrays
by planetscape (Chancellor) on Mar 02, 2012 at 07:22 UTC | |
by remluvr (Sexton) on Mar 02, 2012 at 09:58 UTC | |
|
Re: Problems with complex structure and array of arrays
by tangent (Parson) on Mar 01, 2012 at 22:45 UTC |