Renyulb28 has asked for the wisdom of the Perl Monks concerning the following question:

I would greatly appreciate any help. The file that I am trying to alter is in the format 23 SNP_A-4293670 0 2713391 with separation by tabs. The second file is in the format SNP_A-1780270 ss75925050 rs987435 with separation by spaces. I want the first file to look like this: 23 rs###### 0 2713391 which means that I want the data in second column of the first file to be replaced with the data in the third column of the second file if it matches the first column of the second file. Thanks
  • Comment on Replace data in the column of one file with corresponding ones in another file

Replies are listed 'Best First'.
Re: Replace data in the column of one file with corresponding ones in another file
by wind (Priest) on Jan 27, 2011 at 21:33 UTC
    Something like this would work:
    # I would greatly appreciate any help. The file that I am trying to # alter is in the format # 23 SNP_A-4293670 0 2713391 # with separation by tabs. The second file is in the format # SNP_A-1780270 ss75925050 rs987435 # with separation by spaces. I want the first file to look like this: # 23 rs###### 0 2713391 # which means that I want the data in second column of the first # file to be replaced with the data in the third column of the second # file if it matches the first column of the second file. Thanks use File::Slurp qw(read_file); use Tie::File; use strict; my $file1 = 'file1.txt'; my $file2 = 'file2.txt'; my %hash = map { chomp; my @data = split /\s+/; $data[0] => $data[2]; } read_file($file2); tie my @array, 'Tie::File', $file1 or die "Can't open $file1: $!"; foreach (@array) { s{^(\S*\s+)(\S+)}{ if (exists $hash{$2}) { print "Replacing $2 with $hash{$2}\n"; $1 . $hash{$2}; } else { print "Can't find $2, not changing\n"; $1 . $2; } }e; }
    - Miller
      thank you, but it is giving me this error when I run the script Can't locate File/Slurp.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .) at 2FilesFindReplace.pl line 1.
        The version not relying on File::Slurp would be the following:
        use Tie::File; use strict; my $file1 = 'file1.txt'; my $file2 = 'file2.txt'; open my $fh, $file2 or die "Can't open $file2: $!"; my %hash = map { chomp; my @data = split /\s+/; $data[0] => $data[2]; } <$fh>; tie my @array, 'Tie::File', $file1 or die "Can't open $file1: $!"; foreach (@array) { s{^(\S*\s+)(\S+)}{ if (exists $hash{$2}) { print "Replacing $2 with $hash{$2}\n"; $1 . $hash{$2}; } else { print "Can't find $2, not changing\n"; $1 . $2; } }e; }
        - Miller
        So File::Slurp is not installed.

        Download and install it.

Re: Replace data in the column of one file with corresponding ones in another file
by Anonyrnous Monk (Hermit) on Jan 27, 2011 at 21:08 UTC

    Read the data from the second file into a hash ($hash{$firstcolumn} = $thirdcolumn).  Then go through the first file line by line, split the columns and replace the second column with $hash{$secondcolumn} before you join and print the line out again.

Re: Replace data in the column of one file with corresponding ones in another file
by remiah (Hermit) on Jan 27, 2011 at 21:08 UTC
    if first line of file2 is a primary key, below will work. But if not, it will fail like this ...
    use strict; use warnings; my $file1=<<EOF; 23 SNP_A-4293670 0 2713391 24 SNP_A-4293670 0 2713391 25 SNP_A-1780270 0 1111111 26 SNP_A-1780271 0 2222222 EOF my $file2=<<EOF; SNP_A-1780270 ss75925050 rs987435 SNP_A-1780271 ss75925050 rs000001 SNP_A-4293670 ss75925050 rs999999 SNP_A-4293670 ss75925050 xxxxxxxx EOF my %lookup; foreach my $line ( split( /\n/, $file2) ){ my @line=split(/\s+/,$line); $lookup{$line[0]}=$line[2]; } print "k=$_,v=$lookup{$_}\n" for keys %lookup; foreach my $line ( split( /\n/, $file1) ){ my @line=split(/\s+/,$line); printf "%s\t%s\t%s\t%s\n", $line[0], $lookup{$line[1]}, $line[2], +$line[3]; }

    result

    23 xxxxxxxx 0 2713391 24 xxxxxxxx 0 2713391 25 rs987435 0 1111111 26 rs000001 0 2222222
    first column of the seconde file is the problem. Is it unique key?
      thanks for the help; what did you mean by unique key? It is an ID for each row and corresponds with the first file. Could you also explain how your script works?

        I mean unique constraint of database, which the field should never have duplicate data. Text file doesn't have such a constraint, sometimes we see duplicate ID in text file. So I think the first thing is to check whether file2 has duplicate ID or not.

        SNP_A-4293670 ss75925050 rs999999 SNP_A-4293670 ss75925050 xxxxxxxx
        This is duplicate ID sample. If file2 has duplicate ID like 'SNP_A-4293670', the result would be overwritten by ' xxxxxxxx'.

Re: Replace data in the column of one file with corresponding ones in another file
by wjw (Priest) on Jan 27, 2011 at 21:14 UTC
    Personally, I would start here -> DBD::AnyData

    That looks to me like a good place to start. It should provide you with the ability to do what you want just using standard SQL statements.

    (Note: I have not used this particular module, but now that I take a look, I think I might soon)

    • ...the majority is always wrong, and always the last to know about it...
    • The Spice must flow...
    • ..by my will, and by will alone.. I set my mind in motion