I have an input file containing MANY records in this format:
>Record 1
AGTCTAGTCAT
CATCATAAGAT
CATCAATCACA
>Other Record
ATGAACAGCAG
ATGAAGAATGG
ATAG
Ah, yes. The ol' FASTA. IF you know that you have no repeats in the id/description info after the '>', the following "one liner" gets you a long way with a standard FASTA file:
% perl -lne 's/>//?$s=$_:$s{$s}.=$_;\
> END{ <do smthng w/ %s> }' \
> huge.fasta massive.fasta humongo.fasta
The uniqueness condition mentioned above is crucial, otherwise this scriptlet will mess with your mind.
A generically useful specialization of the above is
% perl -MStorable -lne 's/>//?$s=$_:$s{$s}.=$_;\
> END{ store \%s "for_later" }' \
> huge.fasta massive.fasta humongo.fasta
Then you can read the-hash-formerly-known-as-
%s from any script whenever you please. See
Storable. Keep in mind, however, that, if left unattended,
Storable::store clobbers without remorse.