Replace data in the column of one file with corresponding ones in another file

Renyulb28 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Replace data in the column of one file with corresponding ones in another file by wind (Priest) on Jan 27, 2011 at 21:33 UTC
Something like this would work: # I would greatly appreciate any help. The file that I am trying to # alter is in the format # 23 SNP_A-4293670 0 2713391 # with separation by tabs. The second file is in the format # SNP_A-1780270 ss75925050 rs987435 # with separation by spaces. I want the first file to look like this: # 23 rs###### 0 2713391 # which means that I want the data in second column of the first # file to be replaced with the data in the third column of the second # file if it matches the first column of the second file. Thanks use File::Slurp qw(read_file); use Tie::File; use strict; my $file1 = 'file1.txt'; my $file2 = 'file2.txt'; my %hash = map { chomp; my @data = split /\s+/; $data[0] => $data[2]; } read_file($file2); tie my @array, 'Tie::File', $file1 or die "Can't open $file1: $!"; foreach (@array) { s{^(\S*\s+)(\S+)}{ if (exists $hash{$2}) { print "Replacing $2 with $hash{$2}\n"; $1 . $hash{$2}; } else { print "Can't find $2, not changing\n"; $1 . $2; } }e; } [download] - Miller	[reply] [d/l]
Re^2: Replace data in the column of one file with corresponding ones in another file by Renyulb28 (Novice) on Jan 27, 2011 at 22:02 UTC
thank you, but it is giving me this error when I run the script Can't locate File/Slurp.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .) at 2FilesFindReplace.pl line 1.	[reply]
Re^3: Replace data in the column of one file with corresponding ones in another file by wind (Priest) on Jan 27, 2011 at 23:03 UTC
The version not relying on File::Slurp would be the following: `use Tie::File; use strict; my $file1 = 'file1.txt'; my $file2 = 'file2.txt'; open my $fh, $file2 or die "Can't open $file2: $!"; my %hash = map { chomp; my @data = split /\s+/; $data[0] => $data[2]; } <$fh>; tie my @array, 'Tie::File', $file1 or die "Can't open $file1: $!"; foreach (@array) { s{^(\S*\s+)(\S+)}{ if (exists $hash{$2}) { print "Replacing $2 with $hash{$2}\n"; $1 . $hash{$2}; } else { print "Can't find $2, not changing\n"; $1 . $2; } }e; }` [download] - Miller	[reply] [d/l]
Re^3: Replace data in the column of one file with corresponding ones in another file by ww (Archbishop) on Jan 28, 2011 at 02:15 UTC
So `File::Slurp` is not installed. Download and install it.	[reply] [d/l]
Re: Replace data in the column of one file with corresponding ones in another file by Anonyrnous Monk (Hermit) on Jan 27, 2011 at 21:08 UTC
Read the data from the second file into a hash (`$hash{$firstcolumn} = $thirdcolumn`). Then go through the first file line by line, split the columns and replace the second column with `$hash{$secondcolumn}` before you join and print the line out again.	[reply] [d/l] [select]
Re: Replace data in the column of one file with corresponding ones in another file by remiah (Hermit) on Jan 27, 2011 at 21:08 UTC
if first line of file2 is a primary key, below will work. But if not, it will fail like this ... use strict; use warnings; my $file1=<<EOF; 23 SNP_A-4293670 0 2713391 24 SNP_A-4293670 0 2713391 25 SNP_A-1780270 0 1111111 26 SNP_A-1780271 0 2222222 EOF my $file2=<<EOF; SNP_A-1780270 ss75925050 rs987435 SNP_A-1780271 ss75925050 rs000001 SNP_A-4293670 ss75925050 rs999999 SNP_A-4293670 ss75925050 xxxxxxxx EOF my %lookup; foreach my $line ( split( /\n/, $file2) ){ my @line=split(/\s+/,$line); $lookup{$line[0]}=$line[2]; } print "k=$_,v=$lookup{$_}\n" for keys %lookup; foreach my $line ( split( /\n/, $file1) ){ my @line=split(/\s+/,$line); printf "%s\t%s\t%s\t%s\n", $line[0], $lookup{$line[1]}, $line[2], +$line[3]; } [download] result `23 xxxxxxxx 0 2713391 24 xxxxxxxx 0 2713391 25 rs987435 0 1111111 26 rs000001 0 2222222` [download] first column of the seconde file is the problem. Is it unique key?	[reply] [d/l] [select]
Re^2: Replace data in the column of one file with corresponding ones in another file by Renyulb28 (Novice) on Jan 27, 2011 at 21:25 UTC
thanks for the help; what did you mean by unique key? It is an ID for each row and corresponds with the first file. Could you also explain how your script works?	[reply]
Re^3: Replace data in the column of one file with corresponding ones in another file by remiah (Hermit) on Jan 28, 2011 at 06:35 UTC
I mean unique constraint of database, which the field should never have duplicate data. Text file doesn't have such a constraint, sometimes we see duplicate ID in text file. So I think the first thing is to check whether file2 has duplicate ID or not. `SNP_A-4293670 ss75925050 rs999999 SNP_A-4293670 ss75925050 xxxxxxxx` [download] This is duplicate ID sample. If file2 has duplicate ID like 'SNP_A-4293670', the result would be overwritten by ' xxxxxxxx'.	[reply] [d/l]
Re: Replace data in the column of one file with corresponding ones in another file by wjw (Priest) on Jan 27, 2011 at 21:14 UTC
Personally, I would start here -> DBD::AnyData That looks to me like a good place to start. It should provide you with the ability to do what you want just using standard SQL statements. (Note: I have not used this particular module, but now that I take a look, I think I might soon) ...the majority is always wrong, and always the last to know about it... The Spice must flow... ..by my will, and by will alone.. I set my mind in motion	[reply]