Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Input record separator

by tlm (Prior)
on Mar 18, 2005 at 15:02 UTC ( [id://440710]=note: print w/replies, xml ) Need Help??


in reply to Input record separator

I have an input file containing MANY records in this format:

>Record 1 AGTCTAGTCAT CATCATAAGAT CATCAATCACA >Other Record ATGAACAGCAG ATGAAGAATGG ATAG

Ah, yes. The ol' FASTA. IF you know that you have no repeats in the id/description info after the '>', the following "one liner" gets you a long way with a standard FASTA file:

% perl -lne 's/>//?$s=$_:$s{$s}.=$_;\ > END{ <do smthng w/ %s> }' \ > huge.fasta massive.fasta humongo.fasta
The uniqueness condition mentioned above is crucial, otherwise this scriptlet will mess with your mind.

A generically useful specialization of the above is

% perl -MStorable -lne 's/>//?$s=$_:$s{$s}.=$_;\ > END{ store \%s "for_later" }' \ > huge.fasta massive.fasta humongo.fasta
Then you can read the-hash-formerly-known-as-%s from any script whenever you please. See Storable. Keep in mind, however, that, if left unattended, Storable::store clobbers without remorse.

the lowliest monk

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://440710]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (6)
As of 2024-04-19 06:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found