in reply to How to avoid using array in concatenating string of multiple lines OR How To Read FASTA

I'd suggest using already written modules or looking at code from people who have already solved this problem. Bio::SeqIO for one. See the code in the next_seq method. The beauty is if you want to change the file format to genbank you replace 'fasta' with 'genbank'.
use Bio::SeqIO; my $in = Bio::SeqIO->new(-format => 'fasta', -fh => \*DATA); my $i =1; while( my $s = $in->next_seq ) { print $i++, " : ", $seq->seq(), "\n"; } __DATA__ > Seq 1 (two lines) AAAAAAAAAAAAA CCAAAAAAAAAAA > Seq 2 (two lines) AAAAAAAAAAAAA AAAAAAAAAAAAA > Seq 3 (one line) TTTTTTTTTTTTAACTGAAGATTCGC
  • Comment on Re: How to avoid using array in concatenating string of multiple lines
  • Download Code

Replies are listed 'Best First'.
Re^2: How to avoid using array in concatenating string of multiple lines
by monkfan (Curate) on Dec 10, 2004 at 03:17 UTC
    Thanks so much for your reply stajich.

    Indeed the module is very very useful.
    However I find a problem while extending the usage. Perhaps you can give some advice.

    Suppose I am taking a file as input (the content of the file is similar to my OP), and wish to process that file in *multiple trials*. So I have the following code.
    #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $file = $ARGV[0]; open INFILE, "<$file" or die "$0: Can't open file $file: $!"; for (my $trial = 1; $trial <=2; $trial++) { seek(INFILE,0,0); #This is line 10 print "Trial $trial\n"; my $i =1; my $in = Bio::SeqIO->new(-format => 'fasta', -fh => \*INFILE); while( my $seq = $in->next_seq ) { print $i++, " : ", $seq->seq(), "\n"; } }
    The my code above (especially in Bio::SeqIO method) encounter this warning while arriving at second trial.
    Trial 1 1 : TGCAATCACTAGCAAGCTCTCGCTGCCGTCACTAGCCTGTGG 2 : GGGGCTAGGGTTAGTTCTGGANNNNNNNNNNNNNNNNNNNNN seek() on closed filehandle INFILE at test.pl line 10. Trial 2 readline() on closed filehandle INFILE at /usr/lib/perl5/site_perl/5.8 +.0/Bio/Root/IO.pm line 440.
    I know that I can avoid this warnings by replacing SEEK function with "open INFILE.."
    But I am curious how can I solve this problem if I intend to keep the SEEK function.
    Since I found the solution is neater that way. Hope to hear from you again.
    Regards,
    Edward
      Then open the file within the for-loop:

      #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $file = $ARGV[0]; for (my $trial = 1; $trial <=2; $trial++) { open INFILE, "<$file" or die "$0: Can't open file $file: $!"; seek(INFILE,0,0); #This is line 10 print "Trial $trial\n"; my $i =1; my $in = Bio::SeqIO->new(-format => 'fasta', -fh => \*INFILE); while( my $seq = $in->next_seq ) { print $i++, " : ", $seq->seq(), "\n"; } }
        Thanks so much for the answer reneeb.

        But as I stated in my 2nd posting of this thread. We don't need both. Namely, we don't even need "seek" function at all,
        if we put "open INFILE.." within for-loop. It will work, that I know.

        Forgive me if I sound nitpicking. But I just wonder if we can do
        something by keeping "open" function outside for-loop,
        while maintaining "seek" within? Would it be more efficient?

        Regards,
        Edward
      Running the while loop
      while( my $seq = $in->next_seq ) { print $i++, " : ", $seq->seq(), "\n"; }
      will read until the end of the filehandle. (That is why the loop ended in the first place). So subsequently calling next_seq on the $in object will give you the error mesg you are seeing. You can
      1. open SeqIO object outside of trial loop, and put seek(INFILE,0) inside the loop. Note that if the SeqIO object gets destroyed (or goes out of scope), you will need to add the flag -noclose => 1 option when initing the SeqIO object or else the filehandle is closed. But if you are going to do this, just move the initialization of the SeqIO object outside the loop.
      2. re-open the file each time in the list (move the Bio::SeqIO initialization into the trial loop)
      3. Or read all the sequences in at once and keep them in memory. (put them into an array).
        my @seqs; while(my $s = $in->next_seq ) { push @seqs, $s; } # now do your loop of trials
      #3 might not work depending on how many sequences and how big they are.
        stajich: you will need to add the flag -noclose => 1 option when initing the SeqIO object or else the filehandle is closed. But if you are going to do this, just move the initialization of the SeqIO object outside the loop.

        Thanks stajich, it works ok now. But it will work with just adding the *-noclose=>1* flag. So this will just work fine:

        #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $file = $ARGV[0]; open INFILE, "<$file" or die "$0: Can't open file $file: $!"; for (my $trial = 1; $trial <=2; $trial++) { seek(INFILE,0,0); print "Trial $trial\n"; my $i =1; my $in = Bio::SeqIO->new(-format => 'fasta', -noclose => 1, -fh => +\*INFILE); while( my $seq = $in->next_seq() ) { print $i++, " : ", $seq->seq(), "\n"; } }
        I don't understand why we still need to move the initialization of the seqIO outside the loop?



        Regards,
        Edward

        PS: BTW, I can't find any documentation of the -noclose flag in Bio::SeqIO perldoc. Where can I find that?