diyaz has asked for the wisdom of the Perl Monks concerning the following question:

I think I've discovered something strange. I am trying to iterate through a text file that is in fastq format which is basically something like:
@Name AGCATATA...whatever nucleotide sequence + KKKKKKKK...quality score
I usually don't have a problem with this because I usually buffer the whole file into a hash and sort through it after. This time I wanted to make if then decisions on the fly.
For (<INFILE>) { if (/^@(\S+)/) { print $1; my $seq = <INFILE>; } }
why does this throw an error. The strange thing is if instead of assigning the filehandle and just have it iterate once with a "or die" command it will tell me it died at the END of the file instead of the next line??

Replies are listed 'Best First'.
Re: Iterating single line within a For Loop
by BrowserUk (Patriarch) on Feb 25, 2015 at 17:20 UTC

    for(<INFILE>) (note:no capital F), slurps the whole file before starting to iterate. You need to use while(<INFILE>).


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
Re: Iterating single line within a For Loop
by kennethk (Abbot) on Feb 25, 2015 at 17:47 UTC
    As BrowserUk says, for(<INFILE>) ... slurps the whole file before starting to iterate.. This means that you've already run out of potential input by the time you get to my $seq = <INFILE>;. If you say
    while (<INFILE>) { if (/^@(\S+)/) { print "$1\n"; my $seq = <INFILE>; } }
    then my $seq = <INFILE>; will grab the line after a line starting with @ because the while test is run once per iteration. A for loop constructs the entire list first, which means it pulls in all lines before the first iteration.

    Also note that, while you are fine here, an unescaped @ in a regular expression will usually end up looking like an array in Perl; it's probably a good idea to get in the habit of escaping them.


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Thank you both! That is very helpful to know!
Re: Iterating single line within a For Loop
by pvaldes (Chaplain) on Jun 23, 2015 at 14:33 UTC

    Untested

    use Bio::SeqIO; $infile = Bio::SeqIO->new(-file => $ARGV[0] , '-format' => 'Fastq'); $outfile = Bio::SeqIO->new(-file => ">myfilename" , '-format' => 'genb +ank'); while ( my $seq = $infile->next_seq() ) { ...do something... # finally you could print to $outfile in genbank format (for example... or in embl, nexml, bsml, fasta, seqxml...) $outfile->write_seq($seq); }