in reply to Unify two files

This seems to be more or less what you are looking for:
use strict; use warnings; my ($mio, $mio2); my $filename = '/Users/Pabli/Desktop/do_human_mapping.gmt'; my $match = 'DOID:2055'; unlink ("myoutputfilename3.txt"); unlink ("myoutputfilename4.txt"); open(my $file, '<', $filename) or die "open: $!"; open my $out_file3, '>', 'myoutputfilename3.txt' or die "$!"; open my $out_file4, '>', 'myoutputfilename4.txt' or die "$!"; while (<$file>){ my ($name,$id,@genes) = split /\t/; if ($id eq $match) { $mio= join("\n",@genes); print $mio."\n"; print $out_file3 $mio."\n"; # print sul file } if (grep/^$match$/, @genes){ $mio2=$id; print $mio2."\n"; print $out_file4 $mio2."\n"; # print sul file } }
I have taken the liberty to move the opening of the files out of the while loop, because it seems inefficient to open the file each time you get a match, but it really depends on you data (how often it matches).

Replies are listed 'Best First'.
Re^2: Unify two files
by pabla23 (Novice) on Nov 10, 2014 at 11:59 UTC
    Ok thanks! Now i have this output:

    APOE

    APOE

    FKBP5

    CRH

    with this list i've to enter again in the same file and to find for a single element the different id that are associated; the file have this format:

    DOID:00001 APOE IL4 RTG5

    DOID:00002 FG6 CRH APOE

    DOID:00003 RTG5 HUTN CRH

    my output would be:

    APOE DOID:00001 DOID:00002

    CRH DOID:00002 DOID:00003

    thanks for your help!!!! Paola
      I'm guessing you are looking for the genes for a certain id, and then a looking for all the id's that have those genes. If so try this
      #!perl use strict; use warnings; my $match = 'DOID:2055'; my $filename = 'do_human_mapping.gmt'; open (my $fh, '<', $filename) or die "open: $!"; my @genes=(); my %gene2id=(); while (<$fh>){ my ($name,$id,@temp) = split /\s+/; if ($id eq $match) { @genes = @temp; } else { for my $gene (@temp){ push @{$gene2id{$gene}},$id } } } for my $gene (@genes){ if (exists $gene2id{$gene}){ print join ' ',$gene,@{$gene2id{$gene}},"\n"; } } __DATA__ DOID:00001 APOE IL4 RTG5 DOID:00002 FG6 CRH APOE DOID:00003 RTG5 HUTN CRH DOID:2055 APOE FKBP5 CRH
      poj
        Is perfect!!!! Thanks a lot!!! Paola