I have two files that look like this
file 1
1gtiA 7
1jpeA 4
1jpeA 6
1jpeA 7
file 2
0.333 # VF
0.267 # TE
0.200 # YD
0.267 # QG
0.000 # G-
0.000 # C-
0.000 # A-
0.000 # D-
0.000 # A-
0.000 # --
0.000 # C-
0.000 # Y-
0.000 # P-
0.200 # PD
0.067 # EL
1.000 # TT
I need to map the information from the first file onto the second by changing the relevant position on the second file to a small case letter.
So the result should be
0.333 # VF
0.267 # TE
0.200 # YD
0.267 # Qg
0.000 # G-
0.000 # C-
0.000 # a-
0.000 # D-
0.000 # A-
0.000 # --
0.000 # C-
0.000 # Y-
0.000 # P-
0.200 # PD
0.067 # El
1.000 # Tt
As you can see, one has to disregard the "-" when doing the mapping.
The information for the two items of interest in file 2 are presented vertically. There could be many more such items but I'll only include two for the example.
score hash_character item1 item2
0.333 # V F
etc
Item 1 corresponds to the item called 1gtiA in file 1 and item 2 1jpeA.
I've got some code but its not working at all so any help much appreciated. Apologies for the number of posts recently. I'm just trying to complete a project and its not going too smoothly! As you can see its very much a work in progress!
#! /usr/local/bin/perl -w
use FileHandle;
use strict;
my $scorecons_file = shift;
#my $alignment_file = shift;
my $csa_file = shift;
my $column_count;
my $res_count1 = 0;
my $res_count2 = 0;
warn "# Reading CSA data";
my $hCSAData = getCSAData($csa_file);
warn "# Got CSA data: ".scalar (keys %$hCSAData);
my $fh_score = new FileHandle($scorecons_file, "r") || die "Cannot ope
+n seq file: $scorecons_file ($!)";
while(my $line = $fh_score->getline)
{
$column_count++;
chomp $line;
my @field = split /\s+/, $line;
#print "$field[0] $field[2]\n"; # test print
my $score = $field[0];
my $sequence = $field[2];
#print "$sequence\n";
my @sequence_field = split //, $sequence;
if("$sequence_field[0]" ne "-")
{
$res_count1++;
#print "$sequence_field[0] ";
# print "$res_count1 $column_count\n";
# if(my $hCSA = $hCSAData->{$column_count}->{$res_count1})
# {
#print "yes";
# }
}
# if("$sequence_field[1]" ne "-")
#{
#$res_count2++;
#print "$sequence_field[1] \n";
# }
#
}
########################################
sub getCSAData
{
my ($fIn) = @_;
my $fh = new FileHandle($fIn)
or die "";
my $res;
my $code;
my $count = 1;
my $key = 0;
my $protein = "";
my $code = 0;
my $hData = {};
while (my $line = $fh->getline)
{
my @cols = split /\s+/, $line;
#$key = "$cols[0]" . "$cols[1]";
#print "$cols[0] $cols[1]\n";
$key = $cols[0];
if("$key" ne "$protein")
{
$code++;
}
$protein = $key;
#print "$code $cols[1]\n";
my $hEntry = {
'code' => $code,
'res' => $cols[1],
};
my ($code, $res) = sort ($hEntry->{code}, $hEntry->{res});
$hData->{$code}->{$res} = $hEntry;
#$hData->{$res}->{$code} = $hEntry;
print "$code $res\n";
}
return $hData;
}