This is untested, but could run roughly 200 times faster than your original. Also, from my reading of the POD, the output file should contain both sequences and their ids.
#!/usr/local/bin/perl use strict; use warnings; use Errno; use lib " /RemotePerl"; use lib " /System/Library/Perl/5.8.6"; use lib " /Library/Perl/5.8.6"; use Bio::Perl; use Bio::SeqIO; use Bio::SearchIO; if (@ARGV < 2) { die "usage: compare.pl <filename1> <filename2>\n"; } my $file1 = $ARGV[0]; my $file2 = $ARGV[1]; # Open first fasta file my $File1 = Bio::SeqIO->new(-file => $file1, -format => 'fasta'); print "File name of the first fasta file is: ".$file1."\n"; #Create a hash from file 1 my %fasta1; while( my $seq1 = $File1->next_seq() ) { $fasta1{ $seq1->seq } = $seq1; } # Open second fasta file my $File2 = Bio::SeqIO->new(-file => $file2, -format => 'fasta'); print "File name of the first fasta file is: ".$file2."\n"; #Setup output file: sequences from File2 which match sequences of File +1 my $output = Bio::SeqIO->new( -file => ">EQUAL_HITS.fasta", -format => "fasta", -flush => 0 ); # write matching sequences of file2 to the output file # Note that if several matches to one sequence exist, # ALL matches are output to the file while( my $seq2 = $File2->next_seq() ) { if( exists $fasta1{ $seq2->seq } ) { $output->write_seq( $fasta1{ $seq2->seq } ); } } print "Comparison is complete! \n";
The main change is that it creates a hash rather than an array from the sequences of file1. This makes the lookup O(1) rather than O(200).
It stores the sequence objects returned by next_seq() as the values, keyed by the sequence, and when matches are found, it give the sequence objects back to write_seq(), for inclusion in the output file. Hopefully, write_seq() knows what to do with them.
Other things to note: use strict is not commented out; I've shortened some of your variable names; the code is indented; as you've set up variables $file1 & $file2, you might as well use them rather than $ARGV{n].
In reply to Re: Compare fasta files with different headers
by BrowserUk
in thread Compare fasta files with different headers
by InfoSeeker
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |