Re: Compare fasta files with different headers

You're reading through the entire @fasta_objs array 12 million times. Using a hash, rather than an array, should greatly reduce processing time. Change these three lines:

my @fasta_objs =();
...
push @fasta_objs,$seqFile1->seq;
...
$fasta->write_seq($seqFile2) if (grep {$_ eq $seqFile2->seq} @fasta_ob
+js);
[download]

my %fasta_objs =();
...
++$fasta_objs{$seqFile1->seq};
...
$fasta->write_seq($seqFile2) if exists $fasta_objs{$seqFile2->seq};
[download]

I'm not familiar with the Bio:: modules so I can't offer advice on how to capture the headers (perhaps you already know or can find out through the documentation); however, once you have the header, you can store it in the hash. So, instead of:

++$fasta_objs{$seqFile1->seq};
[download]

use

$fasta_objs{$seqFile1->seq} = $header;
[download]

This assumes header/sequence combinations are unique. If that's not the case, you'll need a more complex storage solution - maybe something like:

seq => [header1, header2, ...]
[download]

Finally, I would strongly recommend that you do not comment out use strict; globally. If you really need to, just turn strictures off for a small piece of code, e.g.

# Comment explaining why you're doing this
no strict 'refs';
... small piece of code here ...
use strict 'refs';
[download]

-- Ken

Comment on Re: Compare fasta files with different headers Select or Download Code