filipo has asked for the wisdom of the Perl Monks concerning the following question:

Hi, i am stuck with a mapping problem. my input file and script are below. I'm trying to map, if elements is found in two keys, then map the two keys. The script mapping some part of the elements but not fully. I can't seem to see what the problem is. The numbers in front of each (e.g. 2 C00080MM) element are not mapped and i'm removing them. Is it the empty space causing the problem?. Thanks for your help!

INPUT data: R00009MM#R00009#2 C00027MM <=> C00007MM + 2 C00001MM R00014MM#R00014#C00022MM + C00068MM + C00080MM <=> C05125MM + C00011MM R00081MM#R00081#C00007MM + 4 C00126MM + 8 C00080MM <=> 4 C00125MM + 2 +C00001MM + 4 C00080Cyto R00086MM#R00086#C00008MM + C00009MM + 7 C00080Cyto <=> C00002MM + C000 +01MM + 7 C00080MM R00094MM#R00094#2 C00051MM + C00003MM <=> C00127MM + C00004MM + C00080 +MM
OUTPUT expected. R00081MM -> R00094MM R00014MM -> R00086MM R00081MM -> R00086MM R00086MM -> R00081MM R00094MM -> R00014MM R00009MM -> R00081MM
#!/usr/bin/perl use warnings; use strict; open (IN, "input.txt") || die $!; #open (OUT,">output.txt") || die $!; my %groups; my @rxnGraph; while (<IN>) { chomp $_; my @rxn = split /#/, $_; next if length $rxn[1] == 0; my ($left_side, $right_side) = split /\s*<=>\s*/, $rxn[2]; my @left_els = split /\s*\+\s*/, $left_side; my @right_els = split /\s*\+\s*/, $right_side; my $x=0; for ($x=0; $x<=$#left_els; $x++){ $left_els[$x]=~s/^[0-9]//; if($left_els[$x]=~/C.*/){ #print ">>",$left_els[$x],"\n" } } # print ">L>@left_els\n"; my $y=0; for ($y=0; $y<=$#right_els; $y++){ $right_els[$y]=~s/^[0-9]//; if($right_els[$y]=~/C.*/){ # print ">>>$right_els[$y]\n"; } } # print ">R>@right_els\n"; $groups{$_}{left} = $rxn[0] for @left_els; $groups{$_}{right} = $rxn[0] for @right_els; } #while( my ($keys, $values) = each(%groups)){ #print ">>K>>>$keys\n"; #} for my $grp (keys %groups) { if (2 == grep exists $groups{$grp}{$_}, qw/left right/) { print "$groups{$grp}{right} ->", " $groups{$grp}{left}\n"; # push @rxnGraph, "$groups{$grp}{right},", # " $groups{$grp}{left}\n"; } } #print OUT @rxnGraph;

Replies are listed 'Best First'.
Re: Mapping issues
by hdb (Monsignor) on Sep 23, 2013 at 13:21 UTC

    Your main issue is in the lines

    $groups{$_}{left} = $rxn[0] for @left_els; $groups{$_}{right} = $rxn[0] for @right_els;

    If a group value turns up more than once, you overwrite the key value that you stored before. I am lazy, so here is a script that I think does what you want (rather than fixing yours). I tried to keep it as close as possible to yours so you can take parts and paste it into yours.

    use strict; use warnings; my %groups; #my @keys; while(<DATA>){ s/(\s|#)\d\s/$1/g; s/\s//g; my( $key, $left, $right ) = /^([^#]+).*#(.*)<=>(.*)/; $groups{$_}{left }{$key} = 1 for split /\+/, $left; $groups{$_}{right}{$key} = 1 for split /\+/, $right; #push @keys, $key; } for my $grp ( values %groups ) { for my $left ( keys %{ $grp->{left} } ) { for my $right ( keys %{ $grp->{right} } ) { print "$left <=> $right\n"; } } } __DATA__ R00009MM#R00009#2 C00027MM <=> C00007MM + 2 C00001MM R00014MM#R00014#C00022MM + C00068MM + C00080MM <=> C05125MM + C00011MM R00081MM#R00081#C00007MM + 4 C00126MM + 8 C00080MM <=> 4 C00125MM + 2 +C00001MM + 4 C00080Cyto R00086MM#R00086#C00008MM + C00009MM + 7 C00080Cyto <=> C00002MM + C000 +01MM + 7 C00080MM R00094MM#R00094#2 C00051MM + C00003MM <=> C00127MM + C00004MM + C00080 +MM

    UPDATE: Just realized that @keys is not used, so I commented those two lines.

      hey, thank you for the help!