Dear all, I have a file, where i have to extract specific lines from the file.
# ----- prediction on sequence number 1 (length = 105, name = seq_01) +-- # # Constraints/Hints: # (none) # Predicted genes for sequence number 1 on both strands # start gene g1 seq_01 CHECKED gene 28503 30196 0.89 + . g1 seq_01 CHECKED transcript 28503 30196 0.89 + . + g1.t1 seq_01 CHECKED start_codon 28503 28505 . + 0 t +ranscript_id "g1.t1"; gene_id "g1"; # coding sequence = [atgtcgtccctccccactctcatctttctccaccc # atcgctgcggtcctcgccgacccttttgtgccggaagtagggaccgg] # protein sequence = [MTASAFVLGTVAFLHNRLRRSRPRQASTAHR # GTETPLLRSDKENLTTVLDATILVHSLGQKTNLALGATSSSLDLQKTNLAL # VAALTPGIVFPLPSPFVATGLCLQKTNLALGATSSSLDL] # end gene g1 ### # start gene g2 seq_01 CHECKED gene 77978 79779 0.44 + . g2 seq_01 CHECKED transcript 77978 79779 0.44 + . + g2.t1 seq_01 CHECKED start_codon 77978 77980 . + 0 t +ranscript_id "g2.t1"; gene_id "g2"; # coding sequence = [atgccgtcctcgtcaaagcagctggcgatgcc # tcggcccctccttctgcaaaccgccctgccgcccgcctcggctcctccgaa # gccgagcagcctacgcaggggccgcagatgctcgcgggagggaatatcgg] # protein sequence =[MPLDSSSTPTSNPAPSHSSTAYLLFERLHIAEQ # CCPGQGIRHGKWSPGSSEAPT] # end gene g2 ### # # ----- prediction on sequence number 2 (length = 710, name = seq_02) +----- # # Constraints/Hints: # (none) # Predicted genes for sequence number 2 on both strands # start gene g3 seq_02 CHECKED gene 150 2800 0.31 + . g3 seq_02 CHECKED transcript 150 2800 0.31 + . g3 +.t1 seq_02 CHECKED intron 1 149 0.75 + . transcrip +t_id "g3.t1"; gene_id "g3"; # coding sequence = [agctgccctcctcggggccagccttctcttaactc # tttgagaccttcaatcctgaggcgtgagacgcagtctggaggagcagctc] # protein sequence = [LRRETQSGGAALCSLFDPPPTPTACAHANSP] # end gene g3 ### # # ----- prediction on sequence number 3 (length = 713, name = seq_03) +----- # # Constraints/Hints: # (none) # Predicted genes for sequence number 3 on both strands # start gene g4 .... [as same as above]......so on and on...
From this file, i need to extract sequences to 2 different files like:
FILE 1: >seq_01 g1 atgtcgtccctccccactctcatctttctccacccatcgctgcggtcctcgccgacccttttgtgccgga +agtagggaccgg >seq_01 g2 atgccgtcctcgtcaaagcagctggcgatgcctcggcccctccttctgcaaaccgccctgccgcccgcct +cggctcctccgaagccgagcagcctacgcaggggccgcagatgctcgcgggagggaatatcgg >seq_02 g3 agctgccctcctcggggccagccttctcttaactctttgagaccttcaatcctgaggcgtgagacgcagt +ctggaggagcagctc >seq_03 g4 ......so on... FILE 2: >seq_01 g1 MTASAFVLGTVAFLHNRLRRSRPRQASTAHRGTETPLLRSDKENLTTVLDATILVHSLGQKTNLALGATS +SSLDLQKTNLALVAALTPGIVFPLPSPFVATGLCLQKTNLALGATSSSLDL >seq_01 g2 MPLDSSSTPTSNPAPSHSSTAYLLFERLHIAEQCCPGQGIRHGKWSPGSSEAPT >seq_02 g3 LRRETQSGGAALCSLFDPPPTPTACAHANSP >seq_03 g4 ......so on...
The code i have written so far to obtain this is:
#!/usr/bin/perl open(FH,$ARGV[0]); open(OUT1,">file1.txt"); open(OUT2,">file2.txt"); @array=<FH>; $str=join("",@array); @list=split("###",$str); foreach $line(@list){ $line=~m/(# coding sequence = [.*\])(# protein sequence = [.*\])/; print OUT1 "$1"; print OUT2 "$2"; }
I am not getting any answer for this program. havent found how to print the headers too. How can i do it? please advice or give suggestions. thank you. :)

In reply to extraction of sequences by patric

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.