in reply to Re: How to avoid using array in concatenating string of multiple lines
in thread How to avoid using array in concatenating string of multiple lines OR How To Read FASTA

Thanks so much for your reply stajich.

Indeed the module is very very useful.
However I find a problem while extending the usage. Perhaps you can give some advice.

Suppose I am taking a file as input (the content of the file is similar to my OP), and wish to process that file in *multiple trials*. So I have the following code.
#!/usr/bin/perl -w use strict; use Bio::SeqIO; my $file = $ARGV[0]; open INFILE, "<$file" or die "$0: Can't open file $file: $!"; for (my $trial = 1; $trial <=2; $trial++) { seek(INFILE,0,0); #This is line 10 print "Trial $trial\n"; my $i =1; my $in = Bio::SeqIO->new(-format => 'fasta', -fh => \*INFILE); while( my $seq = $in->next_seq ) { print $i++, " : ", $seq->seq(), "\n"; } }
The my code above (especially in Bio::SeqIO method) encounter this warning while arriving at second trial.
Trial 1 1 : TGCAATCACTAGCAAGCTCTCGCTGCCGTCACTAGCCTGTGG 2 : GGGGCTAGGGTTAGTTCTGGANNNNNNNNNNNNNNNNNNNNN seek() on closed filehandle INFILE at test.pl line 10. Trial 2 readline() on closed filehandle INFILE at /usr/lib/perl5/site_perl/5.8 +.0/Bio/Root/IO.pm line 440.
I know that I can avoid this warnings by replacing SEEK function with "open INFILE.."
But I am curious how can I solve this problem if I intend to keep the SEEK function.
Since I found the solution is neater that way. Hope to hear from you again.
Regards,
Edward

Replies are listed 'Best First'.
Re^3: How to avoid using array in concatenating string of multiple lines
by stajich (Chaplain) on Dec 10, 2004 at 21:11 UTC
    Running the while loop
    while( my $seq = $in->next_seq ) { print $i++, " : ", $seq->seq(), "\n"; }
    will read until the end of the filehandle. (That is why the loop ended in the first place). So subsequently calling next_seq on the $in object will give you the error mesg you are seeing. You can
    1. open SeqIO object outside of trial loop, and put seek(INFILE,0) inside the loop. Note that if the SeqIO object gets destroyed (or goes out of scope), you will need to add the flag -noclose => 1 option when initing the SeqIO object or else the filehandle is closed. But if you are going to do this, just move the initialization of the SeqIO object outside the loop.
    2. re-open the file each time in the list (move the Bio::SeqIO initialization into the trial loop)
    3. Or read all the sequences in at once and keep them in memory. (put them into an array).
      my @seqs; while(my $s = $in->next_seq ) { push @seqs, $s; } # now do your loop of trials
    #3 might not work depending on how many sequences and how big they are.
      stajich: you will need to add the flag -noclose => 1 option when initing the SeqIO object or else the filehandle is closed. But if you are going to do this, just move the initialization of the SeqIO object outside the loop.

      Thanks stajich, it works ok now. But it will work with just adding the *-noclose=>1* flag. So this will just work fine:

      #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $file = $ARGV[0]; open INFILE, "<$file" or die "$0: Can't open file $file: $!"; for (my $trial = 1; $trial <=2; $trial++) { seek(INFILE,0,0); print "Trial $trial\n"; my $i =1; my $in = Bio::SeqIO->new(-format => 'fasta', -noclose => 1, -fh => +\*INFILE); while( my $seq = $in->next_seq() ) { print $i++, " : ", $seq->seq(), "\n"; } }
      I don't understand why we still need to move the initialization of the seqIO outside the loop?



      Regards,
      Edward

      PS: BTW, I can't find any documentation of the -noclose flag in Bio::SeqIO perldoc. Where can I find that?
          I don't understand why we still need to move the initialization of the seqIO outside the loop?

        Well did you try it.... You can write it this way since there is no need to initialize the SeqIO object each time since you are resetting the filehandle. I expect it is a miniscule performance different.

        #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $file = $ARGV[0]; open INFILE, "<$file" or die "$0: Can't open file $file: $!"; my $in = Bio::SeqIO->new(-format => 'fasta', -noclose => 1, -fh => \*INFILE); for (my $trial = 1; $trial <=2; $trial++) { seek(INFILE,0,0); print "Trial $trial\n"; my $i =1; while( my $seq = $in->next_seq() ) { print $i++, " : ", $seq->seq(), "\n"; } }
        As for the -noclose option, Bio::SeqIO ISA Bio::Root::IO so see the documenation for that module. Unfortunately perldoc does not allow one to pull in documentation from inherited modules so you have to read around to get the full list. The Pdoc generated documentation at our site doc.bioperl.org does have links up the inheritance hierarchy so you read the docs for the superclasses. So anything you can do with a Bio::Root::IO you can do with a Bio::SeqIO object. All the other Bio::XXIO modules also inherit from Bio::Root::IO.
Re^3: How to avoid using array in concatenating string of multiple lines
by reneeb (Chaplain) on Dec 10, 2004 at 07:28 UTC
    Then open the file within the for-loop:

    #!/usr/bin/perl -w use strict; use Bio::SeqIO; my $file = $ARGV[0]; for (my $trial = 1; $trial <=2; $trial++) { open INFILE, "<$file" or die "$0: Can't open file $file: $!"; seek(INFILE,0,0); #This is line 10 print "Trial $trial\n"; my $i =1; my $in = Bio::SeqIO->new(-format => 'fasta', -fh => \*INFILE); while( my $seq = $in->next_seq ) { print $i++, " : ", $seq->seq(), "\n"; } }
      Thanks so much for the answer reneeb.

      But as I stated in my 2nd posting of this thread. We don't need both. Namely, we don't even need "seek" function at all,
      if we put "open INFILE.." within for-loop. It will work, that I know.

      Forgive me if I sound nitpicking. But I just wonder if we can do
      something by keeping "open" function outside for-loop,
      while maintaining "seek" within? Would it be more efficient?

      Regards,
      Edward