This is the content of one of the result files and i have 40,000 files like this. From these files, i have to copy only the sequence header (starting with >) and the sequence(below the header) in 2 different files. one(seqfile1) for "GENSCAN_predicted_CDS" and one(seqfile2) for "GENSCAN_predicted_peptide". so all the sequences having (alphabets in lowercase)with their headers will be sent to seqfile1 and all the sequences having (ALPHABETS in uppercase)with their headers will be sent to seqfile2. hmmmmmm... my program goes like this:GENSCAN 1.0 Date run: 11-Mar-109 Time: 00:29:37 Sequence scaffold_2 : 47583 bp : 37.42% C+G : Isochore 1 ( 0 - 100 C+G +%) Parameter matrix: test.smat Predicted genes/exons: Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr.. ----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------ 1.01 Init - 2604 2462 143 0 2 77 96 184 0.883 20.81 1.00 Prom - 3677 3638 40 2.82 2.00 Prom + 4382 4421 40 1.62 2.01 Init + 6395 6399 5 1 2 73 84 0 0.122 -0.16 2.02 Intr + 8270 8412 143 2 2 74 82 110 0.141 12.00 2.03 Term + 8516 8523 8 0 2 46 46 0 0.473 -8.05 2.04 PlyA + 9723 9728 6 2.27 Predicted peptide sequence(s): Predicted coding sequence(s): >scaffold_2|GENSCAN_predicted_peptide_1|48_aa MDRSHQIWDEKLMDQKCVLYPVCLWHEQESEMNWEAEKAERLRTEQKX >scaffold_2|GENSCAN_predicted_CDS_1|144_bp atggatcggagtcatcagatatgggatgagaagctgatggatcagaagtgcgtcctctac cccgtctgcctctggcacgagcaggagagcgagatgaactgggaggctgagaaagctgag aggctaaggactgaacagaaaggn >scaffold_2|GENSCAN_predicted_peptide_2|51_aa MRKNEGVDDADPLIEGMKEITLKDSATPEDKDQEDKQDERGEEIQEQPQEM >scaffold_2|GENSCAN_predicted_CDS_2|156_bp atgaggaagaatgagggtgttgatgatgcagatccactaatagaaggtatgaaggagatc actctgaaagattcagcaactccagaagacaaggatcaagaagacaaacaagatgagaga ggtgaagaaattcaagaacaacctcaagagatgtag
The problem i am facing is, i am gettin seqfile2_uc very perfect as i wanted. But not seqfile1_lc. the first file sequence is printed, n from the second file onwards the other contents from the result file is also printed. Why is this happening? Can anyone plz help . thank you. my result file should look like this:#!/usr/bin/perl open(OUT1,">seqfile1_lc.out"); open(OUT2,">seqfile2_uc.out"); opendir(DIR,"/home/user/Desktop/resultfolder"); while(($filename=readdir(DIR))) { open(FILE,"/home/user/Desktop/resultfolder/$filename"); while(<FILE>){ if($_ =~ m/^>/) && $_=~m/GENSCAN_predicted_CDS/ || $_=~m/GENSCAN_p +redicted_peptide/) { push(@seq,$seq) if $seq; $seq=(); push(@seq,"\n$_"); } else { $_ =~ s/\s*$//; $seq = $seq."$_"; } } } push(@seq,$seq); while($_= shift @seq){ if($_ =~ m/^>/) && $_=~m/GENSCAN_predicted_CDS/ ){ $header=$_; $seq=shift @seq; print OUT1 $header; print OUT1 $seq; } if($_ =~ m/^>/) && $_=~m/GENSCAN_predicted_peptide/ ){ $header=$_; $seq=shift @seq; print OUT2 $header; print OUT2 $seq; } }
seqfile1_uc.out >scaffold_2|GENSCAN_predicted_CDS_1|144_bp atggatcggagtcatcagatatgggatgagaagctgatggatcagaagtgcgtcctctac cccgtctgcctctggcacgagcaggagagcgagatgaactgggaggctgagaaagctgag aggctaaggactgaacagaaaggn >scaffold_2|GENSCAN_predicted_CDS_2|156_bp atgaggaagaatgagggtgttgatgatgcagatccactaatagaaggtatgaaggagatc actctgaaagattcagcaactccagaagacaaggatcaagaagacaaacaagatgagaga ggtgaagaaattcaagaacaacctcaagagatgtag ........... so on... from all the 40,000 files. seqfile2_uc.out >scaffold_2|GENSCAN_predicted_peptide_1|48_aa MDRSHQIWDEKLMDQKCVLYPVCLWHEQESEMNWEAEKAERLRTEQKX >scaffold_2|GENSCAN_predicted_peptide_2|51_aa MRKNEGVDDADPLIEGMKEITLKDSATPEDKDQEDKQDERGEEIQEQPQEM ....... so on...from all the 40,000 files.
In reply to parsing sequences from a file by sugar
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |