combining two files based on missing values

sugar has asked for the wisdom of the Perl Monks concerning the following question:

dear monks, I have 2 files which has almost common number of columns. the first file has 5 lines, whereas the second file has only 3 lines. in this, a particular field ID in second file will definitely be present in first file. So, if the fields are similar to both the files, the content from second file should be printed. The remaining unique ID's in first file should also be printed. For example: TAT is the ID common to both the files. So, the 6th column in both files has to be compared.

file1.txt:
234  13  4  49 +  TAT_01  id_nu1  explan1
236  123  3  67 +  TAT_02  id_nu2  explan2
534  12  8  13 +  TAT_03  id_nu3  explan3
764  124  9  33 +  TAT_04  id_nu4  explan4
224  153  2  37 +  TAT_05  id_nu5  explan5

file2.txt:
334  138  34  39 -  TAT_02  PAS_1 id_nu2  new2
545  154  83  11 +  TAT_03  PAS_2 id_nu3  new3
765  131  21  12 -  TAT_05  PAS_3 id_nu5  new5

desired_results:
234  13  4  49 +  TAT_01  id_nu1  explan1
334  138  34  39 -  TAT_02  PAS_1 id_nu2  new2
545  154  83  11 +  TAT_03  PAS_2 id_nu3  new3
764  124  9  33 +  TAT_04  id_nu4  explan4
765  131  21  12 -  TAT_05  PAS_3 id_nu5  new5
[download]

The result file has contents of second file for similar TAT id's and content of first file for different(unique between two files) TAT id's. the program which i have written is not giving me the right answer. Please have a look at it:

#!/usr/bin/perl
open(FH1,"file1.txt");
open(FH2,"file2.txt");
@array=<FH2>;
$jo=join("",@array);
@spl=split("\n",$jo);
while($line1=<FH1>){
        @hold=();
        @coll=split("\t",$line1);
        #$len=$coll[1]-$coll[2];
        #$len=~s/-//g;
        $hit=$coll[5];
        @hold=grep(/$hit/,@array);
        $held=shift(@hold);
   
        if(defined $held){
                ($c1,$c2,$c3,$c4,$c5,$c6,$c7,$c8,$c9)=split("\t",$held
+);
                print "$c1\t$c2\t$c3\t$c4\t$c5\t$c6\t$c7\t$c8\t$c9\n";
        }
        else{
                print "$coll[0]\t$coll[1]\t$coll[2]\t$coll[3]\t$coll[4
+]\t$coll[5]\t$coll[6]\t$coll[7]\t$coll[8]\n";
        }
}
[download]

please suggest me a better way to solve this problem. Thank you very much !!

Comment on combining two files based on missing values Select or Download Code

Replies are listed 'Best First'.
Re: combining two files based on missing values by lostjimmy (Chaplain) on Jun 04, 2009 at 18:27 UTC
Your code is actually almost complete. I've updated your code to use some recommended conventions, such as lexical file handles, three argument open, error checking, and the addition of `use strict` and `use warnings` (they aren't absolutely necessary but can be very helpful, so are recommended). `#!/usr/bin/perl use strict; use warnings; open my $FH1,"<", "file1.txt" or die "could not open file1.txt: $!"; open my $FH2,"<", "file2.txt" or die "could not open file2.txt: $!"; my @array=<$FH2>; while (my $line1 = <$FH1>) { my @hold; my @coll = split /\s+/, $line1; my $hit=$coll[5]; @hold = grep /$hit/, @array; my $held = shift @hold; if (defined $held) { my @cols = split /\s+/, $held; print join "\t", @cols; print "\n"; } else{ print join "\t", @coll; print "\n"; } }` [download] Update: Hash-based solution `my %lines; read_results("file1.txt"); read_results("file2.txt"); print "$lines{$_}\n" for sort {$a <=> $b} keys %lines; sub read_results { my $fn = shift; open my $FH, "<", $fn or die "couldn't open $fn: $!"; while (my $line = <$FH>) { chomp $line; my ($id) = $line =~ /TAT_(\d+)/; next unless $id; $lines{$id} = $line; } }` [download]	[reply] [d/l] [select]
Re: combining two files based on missing values by bichonfrise74 (Vicar) on Jun 04, 2009 at 18:31 UTC
How about this? #!/usr/bin/perl use strict; my $file1 = <<END_FILE2; 334 138 34 39 - TAT_02 PAS_1 id_nu2 new2 545 154 83 11 + TAT_03 PAS_2 id_nu3 new3 765 131 21 12 - TAT_05 PAS_3 id_nu5 new5 END_FILE2 my %hash; open my $myfile, '<', \$file1 or die "cannot open file"; while (<$myfile>) { my (@lines) = split; $hash{$lines[5]} = [@lines]; } while (<DATA>) { my (@lines) = split; if ( defined( $hash{$lines[5]} ) && $hash{$lines[5]}->[5] eq $lines[5] ) { print join " ", @{ $hash{$lines[5]} } , "\n"; } else { print; } } __DATA__ 234 13 4 49 + TAT_01 id_nu1 explan1 236 123 3 67 + TAT_02 id_nu2 explan2 534 12 8 13 + TAT_03 id_nu3 explan3 764 124 9 33 + TAT_04 id_nu4 explan4 224 153 2 37 + TAT_05 id_nu5 explan5 [download]	[reply] [d/l]