Re: Replace data in the column of one file with corresponding ones in another file

if first line of file2 is a primary key, below will work. But if not, it will fail like this ...

use strict;
use warnings;

my $file1=<<EOF;
23 SNP_A-4293670 0 2713391
24 SNP_A-4293670 0 2713391
25 SNP_A-1780270 0 1111111
26 SNP_A-1780271 0 2222222
EOF

my $file2=<<EOF;
SNP_A-1780270 ss75925050 rs987435
SNP_A-1780271 ss75925050 rs000001
SNP_A-4293670 ss75925050 rs999999
SNP_A-4293670 ss75925050 xxxxxxxx
EOF

my %lookup;
foreach my $line ( split( /\n/, $file2) ){
    my @line=split(/\s+/,$line);
    $lookup{$line[0]}=$line[2];
}
print "k=$_,v=$lookup{$_}\n" for keys %lookup;

foreach my $line ( split( /\n/, $file1) ){
    my @line=split(/\s+/,$line);
    printf "%s\t%s\t%s\t%s\n", $line[0], $lookup{$line[1]}, $line[2], 
+$line[3];
}
[download]

result

23      xxxxxxxx        0       2713391
24      xxxxxxxx        0       2713391
25      rs987435        0       1111111
26      rs000001        0       2222222
[download]

first column of the seconde file is the problem. Is it unique key?

Comment on Re: Replace data in the column of one file with corresponding ones in another file Select or Download Code

Replies are listed 'Best First'.
Re^2: Replace data in the column of one file with corresponding ones in another file by Renyulb28 (Novice) on Jan 27, 2011 at 21:25 UTC
thanks for the help; what did you mean by unique key? It is an ID for each row and corresponds with the first file. Could you also explain how your script works?	[reply]
Re^3: Replace data in the column of one file with corresponding ones in another file by remiah (Hermit) on Jan 28, 2011 at 06:35 UTC
I mean unique constraint of database, which the field should never have duplicate data. Text file doesn't have such a constraint, sometimes we see duplicate ID in text file. So I think the first thing is to check whether file2 has duplicate ID or not. `SNP_A-4293670 ss75925050 rs999999 SNP_A-4293670 ss75925050 xxxxxxxx` [download] This is duplicate ID sample. If file2 has duplicate ID like 'SNP_A-4293670', the result would be overwritten by ' xxxxxxxx'.	[reply] [d/l]