perlselami has asked for the wisdom of the Perl Monks concerning the following question:

I have two input files (rmsd1.xvg and rmsd2.xvg). I want to write them into output file after selecting a different column from each input file. My input files and code are the following:
rmsd1.xvg: # GROup of MAchos and Cynical Suckers # @ title "RMSD" @ xaxis label "Time (ps)" @ yaxis label "RMSD (nm)" 1.0000000 0.0000009 20.0000000 0.1478001 40.0000000 0.1648600 60.0000000 0.1645018 80.0000000 0.1786710 100.0000000 0.2115960
rmsd2.xvg: # GROup of MAchos and Cynical Suckers # @ title "RMSD" @ xaxis label "Time (ps)" @ yaxis label "RMSD (nm)" 1.0000000 0.1000009 20.0000000 0.2478001 40.0000000 0.3648600 60.0000000 0.4645018 80.0000000 0.5786710 100.0000000 0.6115960
My code: #!/usr/bin/perl use strict; use warnings; # Open file1 to read open my $input_file1, '<', "rmsd1.xvg" or die qq{Failed to open "rmsd1 +.xvg" for writing: $!}; # Open file2 to read open my $input_file2, '<', "rmsd2.xvg" or die qq{Failed to open "rmsd2 +.xvg" for writing: $!}; # Open new file to write open my $out_file, '>', "out_file.xvg" or die qq{Failed to open "out_f +ile.xvg" for writing: $!}; while(<$input_file1>) { next if /(^\s*$)|(^#)|(^@)/; my @columns1 = split; print $out_file join("\t", $columns1[0],$columns1[1], "\n"); } while(<$input_file2>) { next if /(^\s*$)|(^#)|(^@)/; my @columns2 = split; print $out_file join("\t", $columns2[1]), "\n"; } close($input_file1); close($input_file2); close($out_file);
My code gives me a output as the following.
Output file: 1.0000000 0.0000009 20.0000000 0.1478001 40.0000000 0.1648600 60.0000000 0.1645018 80.0000000 0.1786710 100.0000000 0.2115960 0.1000009 0.2478001 0.3648600 0.4645018 0.5786710 0.6115960
Whereas i want to get a output as the following. That is all columns should be side by side. How can i get this output?
Requested: 1.0000000 0.0000009 0.1000009 20.0000000 0.1478001 0.2478001 40.0000000 0.1648600 0.3648600 60.0000000 0.1645018 0.4645018 80.0000000 0.1786710 0.5786710 100.0000000 0.2115960 0.6115960

Replies are listed 'Best First'.
Re: Adding some columns into a file from two files using perl
by Anonymous Monk on Jan 19, 2015 at 13:32 UTC

    You could first save the data from the second file in a hash with the first column as a key, and then retrieve that data from the hash as you're going over the lines from the first file. The following code assumes that the first columns in the two files are identical (string-wise). Also this will read the entire second file into memory, so it might not be too great if your second file becomes huge.

    # ... snip opening the files ... my %file2data; while(<$input_file2>) { next if /(^\s*$)|(^#)|(^@)/; my @columns2 = split; $file2data{$columns2[0]} = $columns2[1]; } while(<$input_file1>) { next if /(^\s*$)|(^#)|(^@)/; my @columns1 = split; print $out_file join("\t", $columns1[0],$columns1[1], $file2data{$columns1[0]}), "\n"; }

    This will produce your expected output.

    Note that if you're going to be manipulating a lot of files like this, then investing the time into learning Text::CSV will probably be worth it. Also, if your dataset is large, this is the kind of operation that a database would handle well.

      @Anonymous Monk, thanks for your reply. But I get the following error. How can i fixed it?
      Use of uninitialized value in join or string at ./rmsd2.pl line 30, <$ +input_file1> line 14.

        Try adding use diagnostics; at the top of your program to get explanations of the messages. In this case it's just a warning, not an error, and since you seem to be running your code with different input files than the ones in the OP (they don't have 14 lines), and I don't know what line 30 of your program does, I can only take a wild guess: Maybe $file2data{$columns1[0]} is empty, because the first columns of the files don't match up exactly. As stated, the example code doesn't really handle that case, but you could fairly easily modify it so it does, depending on what your input actually looks like. Without seeing a more representative sample of your input and your current program it's hard to say what the best solution is. See also the Basic debugging checklist.