in reply to Replace data in the column of one file with corresponding ones in another file

if first line of file2 is a primary key, below will work. But if not, it will fail like this ...
use strict; use warnings; my $file1=<<EOF; 23 SNP_A-4293670 0 2713391 24 SNP_A-4293670 0 2713391 25 SNP_A-1780270 0 1111111 26 SNP_A-1780271 0 2222222 EOF my $file2=<<EOF; SNP_A-1780270 ss75925050 rs987435 SNP_A-1780271 ss75925050 rs000001 SNP_A-4293670 ss75925050 rs999999 SNP_A-4293670 ss75925050 xxxxxxxx EOF my %lookup; foreach my $line ( split( /\n/, $file2) ){ my @line=split(/\s+/,$line); $lookup{$line[0]}=$line[2]; } print "k=$_,v=$lookup{$_}\n" for keys %lookup; foreach my $line ( split( /\n/, $file1) ){ my @line=split(/\s+/,$line); printf "%s\t%s\t%s\t%s\n", $line[0], $lookup{$line[1]}, $line[2], +$line[3]; }

result

23 xxxxxxxx 0 2713391 24 xxxxxxxx 0 2713391 25 rs987435 0 1111111 26 rs000001 0 2222222
first column of the seconde file is the problem. Is it unique key?

Replies are listed 'Best First'.
Re^2: Replace data in the column of one file with corresponding ones in another file
by Renyulb28 (Novice) on Jan 27, 2011 at 21:25 UTC
    thanks for the help; what did you mean by unique key? It is an ID for each row and corresponds with the first file. Could you also explain how your script works?

      I mean unique constraint of database, which the field should never have duplicate data. Text file doesn't have such a constraint, sometimes we see duplicate ID in text file. So I think the first thing is to check whether file2 has duplicate ID or not.

      SNP_A-4293670 ss75925050 rs999999 SNP_A-4293670 ss75925050 xxxxxxxx
      This is duplicate ID sample. If file2 has duplicate ID like 'SNP_A-4293670', the result would be overwritten by ' xxxxxxxx'.