comment on

Most problems of this sort are easiest to code using a hash to store the important content of one of the files. In this case I just picked the first file, but a better choice for a real world version would probably read the second file first.

Note that the code processing the first file builds a list of protein IDs for later use.

#!/usr/bin/perl
use warnings;
use strict;

my $tmp01 = <<FILE1;
PeptideID    ProteinID
6    109521
7    741
11    681
11    780
20    2352
27    1490
27    1491
27    1492
28    51996
29    1490
29    1491
29    1492
30    1490
30    1491
30    1492
FILE1
my $tmp02 = <<FILE2;
PeptideID    SpectrumID    Sequence
6    53663    KMGEGR
7    53663    KPPSGK
11    144492    NNDALR
20    15547    SPAKPK
27    55547    LHKPPK
28    55547    LFVGRK
29    55504    LHKPPK
30    55602    LHKPPK
FILE2
my $tmp11_QUICK = '';
my %peptides;

open my $TAB01, '<', \$tmp01;

while (<$TAB01>) {
    chomp;
    
    my ($peptide, $protein) = split /\t+/;
    next if $peptide !~ /\d/;
    
    #$peptides{$peptide} //= [];
    push @{$peptides{$peptide}}, $protein;
}

open my $TAB02, '<', \$tmp02;
open my $OUT, '>', \$tmp11_QUICK;

while (<$TAB02>) {
    chomp;
    
    my ($peptide, $spectrum, $sequence) = split /\t+/;
    next if $peptide !~ /\d/;
    
    print $OUT "$peptide\t$_\t$spectrum\t$sequence\n" for @{$peptides{
+$peptide}};
}

close $OUT;

print $tmp11_QUICK;
[download]

Prints:

6    109521    53663    KMGEGR
7    741    53663    KPPSGK
11    681    144492    NNDALR
11    780    144492    NNDALR
20    2352    15547    SPAKPK
27    1490    55547    LHKPPK
27    1491    55547    LHKPPK
27    1492    55547    LHKPPK
28    51996    55547    LFVGRK
29    1490    55504    LHKPPK
29    1491    55504    LHKPPK
29    1492    55504    LHKPPK
30    1490    55602    LHKPPK
30    1491    55602    LHKPPK
30    1492    55602    LHKPPK
[download]

Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond

In reply to Re: match two files by GrandFather
in thread match two files by yueli711

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.