Hello All,

I am trying to parse a fasta file using perl. The following is the input file:

>CVSF43565.d1 bg|346278 CAGACACACTTCTTTTAGTTGAGACACATGGAAAACATCATGTATGGCAGACAACTGTTCTGGGAGTTGG ATCCGGGTAAGCAACGGGTCCACATATCTCCACAATCTCATAAGGGGCCAACATAGCGGGGGAGCTAACT TGCCTTTGATTCCAAACCGTTGCACTCCTTTGGTCGGGGAAACTCGAAGGTACACATGATCACCAAGGTC GAACTGCAGGGGTCTTCTCCGCTGGTCGGAGTAGCTCTTCTGTCAAGATTGGGCGGCCTTGAGATGTGCT TGAATTACCTTCACTTGCTCTTCGGCTTCTGCCACTTAAGTCAGGGCCATAGACCTGTCTCTCCCCTGGG CAGACACACTTCTTTTAGTTGAGACACATGGAAAACATCATGTATGGCAGACAACTGTTCTGGGAGTTGG ATCCGGGTAAGC >CVSF43566.d1 bg|346279 CAGACACACTTCTTTTAGTTGAGACACATGGAAAACATCATGTATGGCAGACAACTGTTCTGGGAGTTGG ATCCGGGTAAGCCAGACACACTTCTTTTAGTTGAGACACATGGAAAACATCATGTATGGCAGACAACTGT TCTGGGAGTTGGAATGCTAGTCGATCGCCAGACACACTTCTTTTAGTTGAGACACATGGAAAACATCATG TTGGCAGACAACTGTTCTGGGAGTTGGATCCGGGTAAGCCAGACACACTTCTTTTAGTTGAGACACATGG AAAACATCATGTATGGCAGACAACTGTTCTGGGAGTTGGATCCGGGTAAGC >CVSF43567.d1 bg|346280 CGTAGCTGATGCTGTGCTGTTGTGTCGGGGGGATATATATATATATATGGGGTCGTAGTCGTAGCGCTAG TATGCTAGCAGCGTAGATGCTGATCGATGCTGATGCTGATCGTAGTCGTAGGCTAGTGCGATCGTAGTCG TAGTCGATGCTGATGCGTAGCTGATGTGCTGCTGATGCTAGTCGTCGTAGCTGATGCATGCTGATCGTAG TGCTCGATGCTAGTCGTAGTCGTAGTCGTAGCGACTGATGCGATCGTAGTCGGATGCTAGCACGTAGCTG GCTCGATGCTGATGCTGAT >CVSF10000.x1 bg|356789 pair:789860 ATGCGTAGCTGATGTGCTGCTGATGCTAGTCGTCGTAGCTGATGCATGCTGATCGTAGTGCTCGATGCTA GTCGTAGTCGTAGTCGTAGCGACTGATGCGATCGTAGTCGGATGATGCTGACTGATGCTGATCTGTACGT CGTAGCTGATGCATGCGCTAGTAGCT >CVSF10000.y1 bg|356790 pair:789859 GCTAGTCGATGCTGATGCTGTAGCTAGCGTAGTCGTACGCGCGCGCGCGCGTTTTTTGTGACGTCGTAGT CCGTAGCTGATGCGATGCTAGTGCTGTGTCAGCTGATGTCGTGTGTAGCTGATGCTGATCGTTCGTGTGT CGATGCTGATGCTAGTCGTAGTGTAT >CVSF10001.x1 bg|356791 pair:789862 AGTCGTAGTCGTAGCTGTAGCTGATGCTGTGTACGATGCTGATGCGATGCGTAGCGTAGCATCGATGCTA CGACTAGTCGTAGTCGTC >CVSF10001.y1 bg|356792 pair:789861 CGTAGCTGATGCTGATCGTAGTCGTAGTCGATGCGATGCTAGTCGTAGCTGTAGCTGATGCTGCGTGCTG CAGTCGATGCTAGTCGATGCTGATCGTCTAGCAT

I want to write the lines(and the data that follows) with "pairs" field in one file and the lines without "pairs" field in another.

However, with the following code I am only able to write the header lines. But I also want the data following the header line(ATGCTAGCTG....) to be included in the output files.

Any inputs??

#!/usr/bin/perl my $in = $ARGV[0]; my $p = $ARGV[1]; my $s = $ARGV[2]; open IN, "<$in" or die $!; open P_OUT, ">$p" or die $!; open S_OUT, ">$s" or die $!; while(<IN>){ chomp; if(/^>/){ my @header = split / /; if($header[2] ne ''){ print P_OUT "$header[0]"." "."$header[1]"." "."$header[2]\n"; } else{ print S_OUT "$header[0]"." "."$header[1]\n"; } } #unless(/^>/){ #print OUT "$_\n"; #next; #} } close(IN); close(P_OUT); close(S_OUT);

Thanks!!!


In reply to Print the data following few specific lines in perl by ad23

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.