Re: match two files

The script below produces the same output as in your example output. Run it as follows (assuming the script is stored as z):

perl z tmp01 tmp02

It's formatted widespreadly for better readability.

#!/usr/bin/env perl

use strict;
use warnings;

my $seen;

while ( <> ) {
    next unless /^\d/;

    s/\s*$//;

    next unless m/        # [file1]    [file2]
        ^
        (\S+)        # PeptideID    PeptideID
        \s+
        (\S+)        # ProteinID    SpectrumID
        (?:
            \s+
            (\S+)    # -----        Sequence
        )?
        $
    /x;

    if ( $3 ) {
        $seen->{$1}->{SpectrumID} = $2;
        $seen->{$1}->{Sequence}  = $3;
    } else {
        push @{ $seen->{$1}->{ProteinID} }, $2;
    }
}

sub frmt {
    print join("\t", @_) . "\n";
}

frmt qw( PeptideID ProteinID SpectrumID Sequence );

foreach my $k ( sort { $a <=> $b } keys %{ $seen } ) {
    foreach my $p ( @{ $seen->{$k}->{ProteinID} } ) {
        frmt $k, $p, $seen->{$k}->{SpectrumID}, $seen->{$k}->{Sequence
+};
    } 
}
[download]

Comment on Re: match two files Select or Download Code

Replies are listed 'Best First'.
Re^2: match two files by yueli711 (Sexton) on Dec 09, 2020 at 16:35 UTC
Hello siberia-man, Thank you so much for your great help!. There are still some errors. Thank yo again and really appreciated! Best, Yue `perl match_quick02.pl tmp01_quick tmp02_quick syntax error at match_quick02.pl line 42, near "+}" syntax error at match_quick02.pl line 44, near "}" Execution of match_quick02.pl aborted due to compilation errors.` [download]	[reply] [d/l]
Re^3: match two files by huck (Prior) on Dec 09, 2020 at 16:47 UTC
I suspect you used cut+paste rather than the download link, for the `+}` your error shows is not in the actual code anywhere.	[reply] [d/l]
Re^4: match two files by yueli711 (Sexton) on Dec 09, 2020 at 17:52 UTC
Hello huck, Thank yo so much for your great help! Really appreciated! Best, Yue	[reply]