in reply to Input record separator
I have an input file containing MANY records in this format:
>Record 1 AGTCTAGTCAT CATCATAAGAT CATCAATCACA >Other Record ATGAACAGCAG ATGAAGAATGG ATAG
Ah, yes. The ol' FASTA. IF you know that you have no repeats in the id/description info after the '>', the following "one liner" gets you a long way with a standard FASTA file:
The uniqueness condition mentioned above is crucial, otherwise this scriptlet will mess with your mind.% perl -lne 's/>//?$s=$_:$s{$s}.=$_;\ > END{ <do smthng w/ %s> }' \ > huge.fasta massive.fasta humongo.fasta
A generically useful specialization of the above is
Then you can read the-hash-formerly-known-as-%s from any script whenever you please. See Storable. Keep in mind, however, that, if left unattended, Storable::store clobbers without remorse.% perl -MStorable -lne 's/>//?$s=$_:$s{$s}.=$_;\ > END{ store \%s "for_later" }' \ > huge.fasta massive.fasta humongo.fasta
the lowliest monk
|
|---|