Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
but certainly not recommended. :) Using File::Stream one could set $/ to regex, but this approach suffers from several caveats and I wouldn't recommend it.

Aside from this, the suggestions above are quite good and covered most what I would suggest. But just to add a bit value to the discussion, purely IMHO, the best approach (as already suggested) might be to use bioperl, 'cause who doesn't want to have somebody else taking care of any format change and deal with potential problems therein? :)

Another thing is that set $/ = "\n>" is a correct approach, but ">" is not since FASTA format does not demands that seq description not have '>' in it. I would also certainly set performance as the highest priority in dealing with FASTA format (if I choose not to use Bioperl for some reason) thus code like the following would be an OK alternative to using bioperl:

$/ = "\n>"; while (<DATA>) { chomp; my $seq = /^>/ ? "$_\n" : ">$_\n"; print "seq is:\n$seq"; } __DATA__ >Record 1 AGTCTAGTCAT CATCATAAGAT CATCAATCACA >Record 2 ATGAACAGCAG ATGAAGAATGG ATAG >Record 3 AGTCTAGTCAT CATCATAAGAT CATCAATCACA >Record 4 ATGAACAGCAG ATGAAGAATGG ATAG
Now the above solution is all good, until you consider using it on a FASTA file generated on Mac. So maybe we need File::Stream after all? But it is not solution for huge files. So maybe we just use Bioperl and hope they had dealt with this issue?

Or maybe I'm just making this simple issue sounding more and more complicated? Now that's my only gift. :)

Still, hope it helps. ;)

Updated


In reply to regex record separator is certainly possible by inq123
in thread Input record separator by travisbickle34

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-04-24 15:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found