File.1 conversion table in a tab delimited format
CGP_A0000000001 AAAA
CGP_A0000000002 AAAB
CGP_A0000000003 AAAC
CGP_A0000000004 AAAD
CGP_A0000000005 AAAE
CGP_A0000000006 AAAF
CGP_A0000000007 AAAG
CGP_A0000000008 AAAH
CGP_A0000000009 AAAI
CGP_A0000000010 AAAJ
CGP_A0000000011 AAAK
CGP_A0000000012 AAAL
CGP_A0000000013 AAAM
CGP_A0000000014 AAAN
CGP_A0000000015 AAAO
CGP_A0000000016 AAAP
CGP_A0000000017 AAAQ
CGP_A0000000018 AAAR
CGP_A0000000019 AAAS
CGP_A0000000020 AAAT
CGP_A0000000021 AAAU
CGP_A0000000022 AAAV
####
File.2 has three column and the 2nd needs to be replaced, using conversion table of (File1)
3998122 CGP_A0000000001 13
5877245 CGP_A0000000001 17
3637488 CGP_A0000000001 19
3998162 CGP_A0000000001 21
638421 CGP_A0000000001 23
2395226 CGP_A0000000001 25
3094278 CGP_A0000000001 27
2029460 CGP_A0000000001 29
1406937 CGP_A0000000001 31
2054853 CGP_A0000000001 35
4182290 CGP_A0000000001 37
3784069 CGP_A0000000002 13
6477860 CGP_A0000000002 17
394789 CGP_A0000000002 19
5095549 CGP_A0000000002 21
692543 CGP_A0000000002 23
5446227 CGP_A0000000002 25
1546807 CGP_A0000000002 27
1741167 CGP_A0000000002 29
1187972 CGP_A0000000002 31
1600142 CGP_A0000000002 33
1833098 CGP_A0000000002 35
1770403 CGP_A0000000003 1353
3254322 CGP_A0000000003 1355
6152600 CGP_A0000000003 1357
3195476 CGP_A0000000003 1361
3108815 CGP_A0000000003 1371
77684 CGP_A0000000003 1373
3269969 CGP_A0000000003 1375
3259137 CGP_A0000000003 1377
6502805 CGP_A0000000003 1379
5899118 CGP_A0000000003 1381
5417394 CGP_A0000000003 1383
806606 CGP_A0000000003 1385
1662014 CGP_A0000000003 1387
6490426 CGP_A0000000003 1389
6206360 CGP_A0000000003 1391
####
the RESULT file is the same as the second file but the the second column has the equivalent IDs from 2nd column of file 1
3998122 AAAA 13
5877245 AAAA 17
3637488 AAAA 19
3998162 AAAA 21
638421 AAAA 23
2395226 AAAA 25
####
#!/usr/bin/perl -w
use strict;
use warnings;
use vars qw(%origins);
if( @ARGV < 3){
print "usage: A message here\n";
exit 0;
}
open(INPUT1,$ARGV[0]) || die "Cannot open file \"$ARGV[0]\""; #Orginal IDs and four letter codes
open(INPUT2,$ARGV[1]) || die "Cannot open file \"$ARGV[1]\""; #Orginal IDs in the second column
open(RESULTS,">$ARGV[2]")|| die "Cannot open the Results file \"$ARGV[2]\""; # Origanl IDs will change to four letter code
my %origins;
while () {
chomp;
my @columns = split '\t';
$origins{$columns[0]} = $columns[1];
}
close(INPUT1);
while () {
chomp;
(my $bioC, my $contig_id , my $pip) = split("\t", $_);
for my $oKey (sort keys %origins) {
my $origin = $origins{$oKey};
if ($contig_id eq $oKey){
print RESULTS "$bioC\t$origin\t$pip\n";
#print "$bioC\t$origin\t$pip\n";
}
}
}
close(INPUT2);
close(RESULTS);