This is untested, but could run roughly 200 times faster than your original. Also, from my reading of the POD, the output file should contain both sequences and their ids.

#!/usr/local/bin/perl use strict; use warnings; use Errno; use lib " /RemotePerl"; use lib " /System/Library/Perl/5.8.6"; use lib " /Library/Perl/5.8.6"; use Bio::Perl; use Bio::SeqIO; use Bio::SearchIO; if (@ARGV < 2) { die "usage: compare.pl <filename1> <filename2>\n"; } my $file1 = $ARGV[0]; my $file2 = $ARGV[1]; # Open first fasta file my $File1 = Bio::SeqIO->new(-file => $file1, -format => 'fasta'); print "File name of the first fasta file is: ".$file1."\n"; #Create a hash from file 1 my %fasta1; while( my $seq1 = $File1->next_seq() ) { $fasta1{ $seq1->seq } = $seq1; } # Open second fasta file my $File2 = Bio::SeqIO->new(-file => $file2, -format => 'fasta'); print "File name of the first fasta file is: ".$file2."\n"; #Setup output file: sequences from File2 which match sequences of File +1 my $output = Bio::SeqIO->new( -file => ">EQUAL_HITS.fasta", -format => "fasta", -flush => 0 ); # write matching sequences of file2 to the output file # Note that if several matches to one sequence exist, # ALL matches are output to the file while( my $seq2 = $File2->next_seq() ) { if( exists $fasta1{ $seq2->seq } ) { $output->write_seq( $fasta1{ $seq2->seq } ); } } print "Comparison is complete! \n";

The main change is that it creates a hash rather than an array from the sequences of file1. This makes the lookup O(1) rather than O(200).

It stores the sequence objects returned by next_seq() as the values, keyed by the sequence, and when matches are found, it give the sequence objects back to write_seq(), for inclusion in the output file. Hopefully, write_seq() knows what to do with them.

Other things to note: use strict is not commented out; I've shortened some of your variable names; the code is indented; as you've set up variables $file1 & $file2, you might as well use them rather than $ARGV{n].


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: Compare fasta files with different headers by BrowserUk
in thread Compare fasta files with different headers by InfoSeeker

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.