sugar has asked for the wisdom of the Perl Monks concerning the following question:

dear monks, I have 2 files which has almost common number of columns. the first file has 5 lines, whereas the second file has only 3 lines. in this, a particular field ID in second file will definitely be present in first file. So, if the fields are similar to both the files, the content from second file should be printed. The remaining unique ID's in first file should also be printed. For example: TAT is the ID common to both the files. So, the 6th column in both files has to be compared.
file1.txt: 234 13 4 49 + TAT_01 id_nu1 explan1 236 123 3 67 + TAT_02 id_nu2 explan2 534 12 8 13 + TAT_03 id_nu3 explan3 764 124 9 33 + TAT_04 id_nu4 explan4 224 153 2 37 + TAT_05 id_nu5 explan5 file2.txt: 334 138 34 39 - TAT_02 PAS_1 id_nu2 new2 545 154 83 11 + TAT_03 PAS_2 id_nu3 new3 765 131 21 12 - TAT_05 PAS_3 id_nu5 new5 desired_results: 234 13 4 49 + TAT_01 id_nu1 explan1 334 138 34 39 - TAT_02 PAS_1 id_nu2 new2 545 154 83 11 + TAT_03 PAS_2 id_nu3 new3 764 124 9 33 + TAT_04 id_nu4 explan4 765 131 21 12 - TAT_05 PAS_3 id_nu5 new5
The result file has contents of second file for similar TAT id's and content of first file for different(unique between two files) TAT id's. the program which i have written is not giving me the right answer. Please have a look at it:
#!/usr/bin/perl open(FH1,"file1.txt"); open(FH2,"file2.txt"); @array=<FH2>; $jo=join("",@array); @spl=split("\n",$jo); while($line1=<FH1>){ @hold=(); @coll=split("\t",$line1); #$len=$coll[1]-$coll[2]; #$len=~s/-//g; $hit=$coll[5]; @hold=grep(/$hit/,@array); $held=shift(@hold); if(defined $held){ ($c1,$c2,$c3,$c4,$c5,$c6,$c7,$c8,$c9)=split("\t",$held +); print "$c1\t$c2\t$c3\t$c4\t$c5\t$c6\t$c7\t$c8\t$c9\n"; } else{ print "$coll[0]\t$coll[1]\t$coll[2]\t$coll[3]\t$coll[4 +]\t$coll[5]\t$coll[6]\t$coll[7]\t$coll[8]\n"; } }
please suggest me a better way to solve this problem. Thank you very much !!

Replies are listed 'Best First'.
Re: combining two files based on missing values
by lostjimmy (Chaplain) on Jun 04, 2009 at 18:27 UTC

    Your code is actually almost complete. I've updated your code to use some recommended conventions, such as lexical file handles, three argument open, error checking, and the addition of use strict and use warnings (they aren't absolutely necessary but can be very helpful, so are recommended).

    #!/usr/bin/perl use strict; use warnings; open my $FH1,"<", "file1.txt" or die "could not open file1.txt: $!"; open my $FH2,"<", "file2.txt" or die "could not open file2.txt: $!"; my @array=<$FH2>; while (my $line1 = <$FH1>) { my @hold; my @coll = split /\s+/, $line1; my $hit=$coll[5]; @hold = grep /$hit/, @array; my $held = shift @hold; if (defined $held) { my @cols = split /\s+/, $held; print join "\t", @cols; print "\n"; } else{ print join "\t", @coll; print "\n"; } }

    Update: Hash-based solution

    my %lines; read_results("file1.txt"); read_results("file2.txt"); print "$lines{$_}\n" for sort {$a <=> $b} keys %lines; sub read_results { my $fn = shift; open my $FH, "<", $fn or die "couldn't open $fn: $!"; while (my $line = <$FH>) { chomp $line; my ($id) = $line =~ /TAT_(\d+)/; next unless $id; $lines{$id} = $line; } }

Re: combining two files based on missing values
by bichonfrise74 (Vicar) on Jun 04, 2009 at 18:31 UTC
    How about this?
    #!/usr/bin/perl use strict; my $file1 = <<END_FILE2; 334 138 34 39 - TAT_02 PAS_1 id_nu2 new2 545 154 83 11 + TAT_03 PAS_2 id_nu3 new3 765 131 21 12 - TAT_05 PAS_3 id_nu5 new5 END_FILE2 my %hash; open my $myfile, '<', \$file1 or die "cannot open file"; while (<$myfile>) { my (@lines) = split; $hash{$lines[5]} = [@lines]; } while (<DATA>) { my (@lines) = split; if ( defined( $hash{$lines[5]} ) && $hash{$lines[5]}->[5] eq $lines[5] ) { print join " ", @{ $hash{$lines[5]} } , "\n"; } else { print; } } __DATA__ 234 13 4 49 + TAT_01 id_nu1 explan1 236 123 3 67 + TAT_02 id_nu2 explan2 534 12 8 13 + TAT_03 id_nu3 explan3 764 124 9 33 + TAT_04 id_nu4 explan4 224 153 2 37 + TAT_05 id_nu5 explan5