in reply to Compare fasta files with different headers
You're reading through the entire @fasta_objs array 12 million times. Using a hash, rather than an array, should greatly reduce processing time. Change these three lines:
my @fasta_objs =(); ... push @fasta_objs,$seqFile1->seq; ... $fasta->write_seq($seqFile2) if (grep {$_ eq $seqFile2->seq} @fasta_ob +js);
to
my %fasta_objs =(); ... ++$fasta_objs{$seqFile1->seq}; ... $fasta->write_seq($seqFile2) if exists $fasta_objs{$seqFile2->seq};
I'm not familiar with the Bio:: modules so I can't offer advice on how to capture the headers (perhaps you already know or can find out through the documentation); however, once you have the header, you can store it in the hash. So, instead of:
++$fasta_objs{$seqFile1->seq};
use
$fasta_objs{$seqFile1->seq} = $header;
This assumes header/sequence combinations are unique. If that's not the case, you'll need a more complex storage solution - maybe something like:
seq => [header1, header2, ...]
Finally, I would strongly recommend that you do not comment out use strict; globally. If you really need to, just turn strictures off for a small piece of code, e.g.
# Comment explaining why you're doing this no strict 'refs'; ... small piece of code here ... use strict 'refs';
-- Ken
|
|---|