Most problems of this sort are easiest to code using a hash to store the important content of one of the files. In this case I just picked the first file, but a better choice for a real world version would probably read the second file first.
Note that the code processing the first file builds a list of protein IDs for later use.
#!/usr/bin/perl use warnings; use strict; my $tmp01 = <<FILE1; PeptideID ProteinID 6 109521 7 741 11 681 11 780 20 2352 27 1490 27 1491 27 1492 28 51996 29 1490 29 1491 29 1492 30 1490 30 1491 30 1492 FILE1 my $tmp02 = <<FILE2; PeptideID SpectrumID Sequence 6 53663 KMGEGR 7 53663 KPPSGK 11 144492 NNDALR 20 15547 SPAKPK 27 55547 LHKPPK 28 55547 LFVGRK 29 55504 LHKPPK 30 55602 LHKPPK FILE2 my $tmp11_QUICK = ''; my %peptides; open my $TAB01, '<', \$tmp01; while (<$TAB01>) { chomp; my ($peptide, $protein) = split /\t+/; next if $peptide !~ /\d/; #$peptides{$peptide} //= []; push @{$peptides{$peptide}}, $protein; } open my $TAB02, '<', \$tmp02; open my $OUT, '>', \$tmp11_QUICK; while (<$TAB02>) { chomp; my ($peptide, $spectrum, $sequence) = split /\t+/; next if $peptide !~ /\d/; print $OUT "$peptide\t$_\t$spectrum\t$sequence\n" for @{$peptides{ +$peptide}}; } close $OUT; print $tmp11_QUICK;
Prints:
6 109521 53663 KMGEGR 7 741 53663 KPPSGK 11 681 144492 NNDALR 11 780 144492 NNDALR 20 2352 15547 SPAKPK 27 1490 55547 LHKPPK 27 1491 55547 LHKPPK 27 1492 55547 LHKPPK 28 51996 55547 LFVGRK 29 1490 55504 LHKPPK 29 1491 55504 LHKPPK 29 1492 55504 LHKPPK 30 1490 55602 LHKPPK 30 1491 55602 LHKPPK 30 1492 55602 LHKPPK
In reply to Re: match two files
by GrandFather
in thread match two files
by yueli711
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |