in reply to how to create output file using perl

Better now that you formatted the files' content! But it's still not clear what your objective is. How did the text under 'contig1' and 'contig4' get processed into the output file? Can you show the code for that? Why didn't the text under 'contig2' and 'contig3' get processed correctly? You need to understand why, otherwise you won't be able to fix the problem and it might happen again.

So please post the code that you use so far ...

If your problem is that when you processed your data file, it only processed the first and the last sections, maybe you just need a better regexp? You can get the strings out of your data file very easily with Perl.

#!/usr/bin/env perl use strict; use warnings; use File::Slurp::Tiny 'read_file'; my $file = 'contig.txt'; my $slurp = read_file($file); my %results; while ( $slurp =~ />Contig(\d+).+?([A-Z]+)/sg ) { $results{ $1 } = $2; } foreach my $test_number (sort keys %results ) { print "Test $test_number: $results{ $test_number }\n\n"; ## do something to process $test_number, $results{ $test_number } .. +. } __END__ OUTPUT: Test 1: GAGCTAAATAATTTGAATCAATGGGAAGATCACCGTGTTGTGAAAAAGCACATACAAATAAA +GGAGCTTGGACTAAAGAAGAAGATGAACGACTTATTTCTTATATTAAAACTCACGGCGAAGGTTGCTGG +AGATCCCTTCCTAAAGCTGCCGGACTTCTCCGATGCGGTAAAAGTTGCCGTCTCCGATGGATTAATTAC +TTGAGACCGGACCTTAAACGCGGTAATTTTACTGAAGAAGAAGATGAACTCATTATCAAACTCCATAGC +CTCCTTGGTAACAAATGGTCACTTATAGCCGGAAGATTACCAGGAAGAACAGATAATGAGATAAAAAAT +TACTGGAATACGCACATAAGAAGGAAGCTTTTGAGTCGGGGCATTGATCCAACGACACACAGGCCTGTT +AACGAGCCTGGTACAACGCAAAAAGTCACAACAATTTCATTTGCAGGTGGAGATCATAAAACTAAAGAT +ATTGAAGAAGATCATAATAAGATGATAAATGTCAAAGCTGAATCTGGGTTGAGTCAATTAGAAGATGAA +ATTATTAGTAGCAGTCCATTTCGAGAACAGTGTCCTGATTTAAATCTTGAGCTCAAATTAGCCCTCCTT +CTCTACAAAATTACCAACATAGCCCCTCAAGGTGTTTTGCATGCAGTTTGGGTATACAAAATAGTAAAG +ATTGCAATTGCAGTAAAAATAATATTGCAAGTTATAACTTTTTAGGATTAAAGAGTAATGGTGTTTTGG +ACTATAGAACTTTAGAAACTAAGTGAATTTTTATTATAAATCTTTTTTTCCCTCGTGTATTTGGGTTAA +AAAAACAAGAAGAGAGAATCGAGAAAGATATTCCTATTAGTTTAAGTTCTTTCGAATTTTCTCTTATTT +GTAAAATTTCAAGTATTACTATATACGATATATTATATTAAGTTGAAAAG Test 2: GCTCTTCCAACAACAACAACAATGCCTCATCAAAAGCCTCTTTCTCTCATTCTTCTATCTAC +ACTCCCACTTCTTTTCATTCTCACACAAGCTCAATCACCAACAGCACCAGCACCAGCACCCTCAGGACC +AATAGACATCTTTGCAATCCTCAAAAAAGAAGGACAATACAACACATTCATCAAGTTCCTAAATGAATC +ACAAGTTGGTAACCAAATCAACAACCAAGTAAACAACTCCAACCAAGGCATGACAGTTTTGGCACCATC +AGACAATGCATTTAACAACCTCCCAAGTGGTACACTCAACCAACTAAATGACCAACAAAAAGTACAACT +CATTTTGAACCATGTCATACCAAAGTTCTACACATTTGATGACTTACAAACAGTAAGCAACCCTGTTAG +AACACAAGCAACAGGGCCTAAAGGTGAGCCTTTTGGACTTAACTTTACTGGAAGTAACAATCAAGTGAA +TGTCTCATCTGGTTCTGTTGTTACAAACATTTATAATGCTATTAGAAAAGACCCCCCATTGGCTGTTTT +TCAATTAGACAAAGTTTTAGTACCTTCTCAGTTTACTGATCCATCTAGTGATGATGATGCCCCTGCACC +TACTAAACCCAAGAATGGTACTAGTAATGATAAAACAACAGCTGATGAGCCATCACCAGCAAGTAACAC +TAAGCCAAATGATGCTAAAAGGATCAGTGGTGGGATTCTTGGATTGGTTTGTGGTGTTTTCTTGATGGC +AACACTATCTTGAAGGGGGCTACAGAGTTGTTAACTTTATGATCTTTTGCTTATACTAAGCCATTTTGT +ATTACATTGTTTTCTTCAAGATTGATTGTTTTTGTTCAAAAAAGAAGGGGGGGGGGGAAAAAAAAACCC +CCCTGCGGAAAAGAGCGGGGAAAGCACCAAAAAGCCACCGACCAAAAGCACCAACTCACAAAAGGTGCG +CAGACGCGGAAAGGGGAAAAGGAAAAAATGTGAAAGCTTGTTATAGTTTG Test 3: AAACTGTAATTAGACTTCTCTGCTAAGTTTCTGCTGTATTTGGATTCTCCGGCGAACATTAA +TATCTAACCATGACCGGCGGTGGAGGCGATGCCGCATCGCCGCCTCTATCCTCACAGTCAACTCCATCC +AACGGTGGGGAATTCCTTCTTCAATTGCTTCAGAATCATCCGCATCAACTTCACTCTCAGCCTCAACCG +CCACTGCGGCCGGAGTTGCAGAATCTGCCGCATGATCCAGCAGTTGCAGCAGTAGGTCCTAGTATGCCC +TACCCGCCATTGTTCCATACTCCTACAAACCCTTCTGTTTTGCCCTATTCTCACTCTCCTCCTCTGTTT +GTACCTCATAACTTCTTCATTCGAGGGTTTCTCCAAAACCCTAATTCTGGCCATACCACTAACCCCAAT +TACTCATCTCCGCCTGCCCCAAGTGGGTTCAGTCAATATCACCATGCGAGTCCACTTGGATTTGGATCA +GTCGGAGAAAACATGGGCAATTTGGGGATTTTCGGTGCCAATGCTAAGGCGAG Test 4: CATGTAATAGCATAGCATCCCCAATTTCACCCTCTCATGGCCATGTCCACGCTCCTCTCCCT +GTCCGTGTCTATCCACCCACCAAAACCTTTGCAAAAACCCAATTCAATGTGTACCCAACCTAACTCTAT +TTCGAGAAGACAAGTGTTTTTCACTGGTTCTAATTTATTGCTCTCTCAATTAATTCCAAAATCCGACGC +CCAAACCAATTCCAATAGTTTTCTTTCAGGTATTGCCAATACTAAGTCTTGGTTCCAATTCTATGGCGA +CGGCTTTTCTATTCGTGTTCCACCGGAATTTCAGGACCTCACTGAGCCGGAGGATTATAATGCTGGCCT +ATCACTATATGGAGATAAGGCTAAGCCCAAAAAATTTGCAGCACGTTTTGCTTCTTCTGATGGATCCGA +AGTTTTAAGTGTCATAATTCGTCCATCCAATCAGCTGAAGATCACTTTCTTAGAGGCTAAAGATATTAC +TGATTTAGGTTCACTTAAGGAGGCAGCAAAAATATTTGTTCCAGCTGGCTCAACACTATATTCTGTCCG +CACAATAAAAATTAAAGAAGATGAGGGTTTCAGGACATACTATTTTTATGAATTTGTGAGAAATGAGCA +ACACGTTGCATTAGTGGCTGGTGTTAACAGTGGAAAGGCCGTCATTGCTGGTGCCACGGCCCCCGAAAG +CAAATGGGCCGAGGATGGTTTGAAGCTCCGATCTGCTGCAGTATCAATGACAATTCTATAAGCAGAATG +TGAGTATATATATAGGTTCTATTTCAATGATGATGAATTTATATACAAATATTGAGGATCAAAGTTTTC +TTATTATCATCTAATCTCAGCCAAGGATTAACAAT CTCCATCATCCATTCAATAGCAATGTTTCTGCTGTTTTGC
Remember: Ne dederis in spiritu molere illegitimi!

Replies are listed 'Best First'.
Re^2: how to create output file using perl
by vineetha (Initiate) on Jul 02, 2015 at 04:21 UTC
    First I downloaded all the est fasta sequences of tomato from NCBI.Then I submit that sequence file into an online tool EGassembler for sequence cleaning,vector masking,repeat masking,organelle masking and sequence assembly..As a result I got this contig file..This contig file is used for offline local blast..For doing blastn I downloaded the genome of Arabidopsis thaliana.Then I run the blast..

      Nice, sounds like a fun project!

      So it sounds like the contig file is produced correctly. And it runs "the blast" correctly on the first and the last "contig" in the file. But not on the 2nd or 3rd.

      Sounds like there is a problem passing the individual "contig" sections to the "blatster."

      Have you tried manually splitting up the contig file into smaller pieces and passing them to the blaster? If you cut the file in half and passed #1 + #2 in one file and #3 + #4 in another, and both those files processed correctly, you might conclude that the error is in how you pass the files. If you do that and still #2 and #3 don't process, you might conclude that there is bad data in those contigs.

      How does the blaster report if it finds an error in the input data?

      Remember: Ne dederis in spiritu molere illegitimi!