Unify two files

pabla23 has asked for the wisdom of the Perl Monks concerning the following question:

Good morning!!! I have this code:

use strict;
use warnings;

my $mio;     

my $filename = '/Users/Pabli/Desktop/do_human_mapping.gmt';
my $match = 'DOID:2055';
unlink ("myoutputfilename3.txt");
open(my $file, '<', $filename) or die "open: $!";

while (<$file>){
   my ($name,$id,@genes) = split /\t/;
  if ($id eq $match) {
     $mio= join("\n",@genes);
     print $mio."\n";
     open my $out_file, '>>', 'myoutputfilename3.txt' or die "$!"; 
     print $out_file $mio."\n"; # print sul file
  }
   
}
[download]

And in another file this:

use strict;
use warnings;

my $mio2;
my $filename = '/Users/Pabli/Desktop/do_human_mapping.gmt';
my $match = 'APOE';
unlink ("myoutputfilename4.txt");
open(my $file, '<', $filename) or die "open: $!";

while (<$file>){
   my ($name,$id,@genes) = split /\t/;
    if (grep/^$match$/, @genes){
       $mio2=$id;
       print $mio2."\n";
        open my $out_file, '>>', 'myoutputfilename4.txt' or die "$!"; 
     print $out_file $mio2."\n"; # print sul file
       
   }
}
[download]

I would like to unify this two files. The output of the first file is a list in this format:

APOE

FKBP5

CRH

IL2

Infact into the second file i wrote explicit "APOE".

Can someone help me in order to automate all?

Thanks a lot Paola

Comment on Unify two files Select or Download Code

Replies are listed 'Best First'.
Re: Unify two files by Laurent_R (Canon) on Nov 10, 2014 at 11:27 UTC
This seems to be more or less what you are looking for: use strict; use warnings; my ($mio, $mio2); my $filename = '/Users/Pabli/Desktop/do_human_mapping.gmt'; my $match = 'DOID:2055'; unlink ("myoutputfilename3.txt"); unlink ("myoutputfilename4.txt"); open(my $file, '<', $filename) or die "open: $!"; open my $out_file3, '>', 'myoutputfilename3.txt' or die "$!"; open my $out_file4, '>', 'myoutputfilename4.txt' or die "$!"; while (<$file>){ my ($name,$id,@genes) = split /\t/; if ($id eq $match) { $mio= join("\n",@genes); print $mio."\n"; print $out_file3 $mio."\n"; # print sul file } if (grep/^$match$/, @genes){ $mio2=$id; print $mio2."\n"; print $out_file4 $mio2."\n"; # print sul file } } [download] I have taken the liberty to move the opening of the files out of the while loop, because it seems inefficient to open the file each time you get a match, but it really depends on you data (how often it matches).	[reply] [d/l]
Re^2: Unify two files by pabla23 (Novice) on Nov 10, 2014 at 11:59 UTC
Ok thanks! Now i have this output: APOE APOE FKBP5 CRH with this list i've to enter again in the same file and to find for a single element the different id that are associated; the file have this format: DOID:00001 APOE IL4 RTG5 DOID:00002 FG6 CRH APOE DOID:00003 RTG5 HUTN CRH my output would be: APOE DOID:00001 DOID:00002 CRH DOID:00002 DOID:00003 thanks for your help!!!! Paola	[reply]
Re^3: Unify two files by poj (Abbot) on Nov 10, 2014 at 13:13 UTC
I'm guessing you are looking for the genes for a certain id, and then a looking for all the id's that have those genes. If so try this #!perl use strict; use warnings; my $match = 'DOID:2055'; my $filename = 'do_human_mapping.gmt'; open (my $fh, '<', $filename) or die "open: $!"; my @genes=(); my %gene2id=(); while (<$fh>){ my ($name,$id,@temp) = split /\s+/; if ($id eq $match) { @genes = @temp; } else { for my $gene (@temp){ push @{$gene2id{$gene}},$id } } } for my $gene (@genes){ if (exists $gene2id{$gene}){ print join ' ',$gene,@{$gene2id{$gene}},"\n"; } } __DATA__ DOID:00001 APOE IL4 RTG5 DOID:00002 FG6 CRH APOE DOID:00003 RTG5 HUTN CRH DOID:2055 APOE FKBP5 CRH [download] poj	[reply] [d/l]
Re^4: Unify two files by pabla23 (Novice) on Nov 10, 2014 at 14:01 UTC
Re: Unify two files by blindluke (Hermit) on Nov 10, 2014 at 09:52 UTC
What exactly are you trying to accomplish? You write about "unifying two files", which could mean replacing those two scripts with one, but you also write about your output. Are you trying to write a third script, that "unifies" the output of those two scripts into a combined list (without duplicates, for example)? If possible, try looking at your problem in terms of input and output. What is your input? What exactly do you want to produce? It's very difficult to guess what you mean by "unify". - Luke	[reply]