Getting Lines 2,3, and 4 from FASTQ file's Chunk

neversaint has asked for the wisdom of the Perl Monks concerning the following question:

Dear Masters,
I have a data that looks like this (fastq format). Note that each entry consist of chunk of four lines. Each chunk begins with an entry with "@"sign and ends with octal values (e.g. (IIII... ).

@SRR016086.1 308GJAAXX:2:1:911:957 length=36
CCCCCCCCCCACCCCCCCACACCACAAACACACCAC
+SRR016086.1 308GJAAXX:2:1:911:957 length=36
(IIIIIIII:&H68;I-3%/)*-/4$$')(0&$&,$
@SRR016086.2 308GJAAXX:2:1:916:735 length=36
AGATCAGTTTTGCACTCCACTTAAGTTTGTTCATAT
+SRR016086.2 308GJAAXX:2:1:916:735 length=36
I*IIIIIIIIIII%%CI()I2E6.I5;>+"=C+#%I
@SRR016086.3 308GJAAXX:2:1:908:956 length=36
CCCCCCCCCAACACAAAAACAAAATCCAAAACAATA
+SRR016086.3 308GJAAXX:2:1:908:956 length=36
II@III50I*9I+<00%.<9/,%-"$*2('&"&)""
[download]

What I want to do is to extract line 2, 3 and 4 of each chunk from the above file. I've tried Bioperl with this script but fail:

use Bio::SeqIO;
my $file = $ARGV[0] || "test.fastq"; 

my $in  = Bio::SeqIO->new(-file => $file , '-format' => 'fastq');

while ( my $seq = $in->next_seq() ) {
    print "$seq->id\n";

}
[download]

Nor with my attempted homebrew method, in which I'm stuck with the logic.

my  $INFILE_file_name = $file;      # input file name

open ( INFILE, '<', $INFILE_file_name )
    or croak "$0 : failed to open input $INFILE_file_name : $!\n";

my $count = 0;
my $seqlinec = 2;
while ( <INFILE> ) {
    chomp;

     # Trying to catch line 2 of each chunk..
    if ( $seqlinc eq $count ) {
        print "$_\n";
    }

    $seqlinec += 4;

}               


close ( INFILE );           # close input file
[download]

What's the right way to do it?

---
neversaint and everlastingly indebted.......

Comment on Getting Lines 2,3, and 4 from FASTQ file's Chunk Select or Download Code

Replies are listed 'Best First'.
Re: Getting Lines 2,3, and 4 from FASTQ file's Chunk by BrowserUk (Patriarch) on Apr 16, 2010 at 08:01 UTC
To skip the first of every 4 lines: `perl -wlne"($.%4)-1 and print" yourFile` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l]
Re: Getting Lines 2,3, and 4 from FASTQ file's Chunk by Ratazong (Monsignor) on Apr 16, 2010 at 07:17 UTC
`my $count = 0; my $seqlinec = 2; while ( <INFILE> ) { if ( $seqlinc eq $count ) { print "$_\n"; } $seqlinec += 4; }` [download] I assume you want to somehow count the line you are reading. In that case, you should increment your counter somewhere. Maybe do something like the following (untested!) code `my $count = 0; while ( <INFILE> ) { next if ( !$count++ ); # ignore line zero, increase linec +ount print $_; $count = 0 if ($count >= 3); # reset the counter once we printe +d line 3 }` [download] HTH, Rata note: of course the code could be easily optimized, e.g. by using `($count++ % 4)`; see Boolean counter? for some ways (you need to adapt them to count modulo 4)	[reply] [d/l] [select]
Re: Getting Lines 2,3, and 4 from FASTQ file's Chunk by Sinistral (Monsignor) on Apr 16, 2010 at 12:20 UTC
Although the other monks have given excellent answers, I wanted to make sure you've made note of the many resources at the BioPerl site. Specifically, they have information about your FASTQ sequence format that may have links to tools to help you efficiently process the data (perhaps their information on Bio:SeqIO::fastq also available at Bio::SeqIO::fastq).	[reply]
Re: Getting Lines 2,3, and 4 from FASTQ file's Chunk by umasuresh (Hermit) on Apr 16, 2010 at 14:22 UTC
Another simple solution: use strict; use warnings; while(<DATA>) { chomp; my $line = $_; # skip the first line of every chunk if ($line =~/^@/) { } # print the remaining lines else { print "$line\n"; } } __DATA__ @SRR016086.1 308GJAAXX:2:1:911:957 length=36 CCCCCCCCCCACCCCCCCACACCACAAACACACCAC +SRR016086.1 308GJAAXX:2:1:911:957 length=36 (IIIIIIII:&H68;I-3%/)-/4$$')(0&$&,$ @SRR016086.2 308GJAAXX:2:1:916:735 length=36 AGATCAGTTTTGCACTCCACTTAAGTTTGTTCATAT +SRR016086.2 308GJAAXX:2:1:916:735 length=36 IIIIIIIIIIII%%CI()I2E6.I5;>+"=C+#%I @SRR016086.3 308GJAAXX:2:1:908:956 length=36 CCCCCCCCCAACACAAAAACAAAATCCAAAACAATA +SRR016086.3 308GJAAXX:2:1:908:956 length=36 II@III50I9I+<00%.<9/,%-"$2('&"&)"" [download]	[reply] [d/l]
Re^2: Getting Lines 2,3, and 4 from FASTQ file's Chunk by johngg (Canon) on Apr 16, 2010 at 20:15 UTC
It is a simple solution but it would be even simpler if you avoided the un-necessary chomp and assignment to `$line`, and used an `unless` statement modifier. I don't know if it is safe to assume that every record begins with an "@" symbol. `while ( <DATA> ) { print unless m{^\@}; }` [download] I hope this is of interest. Cheers, JohnGG	[reply] [d/l] [select]
Re^2: Getting Lines 2,3, and 4 from FASTQ file's Chunk by Anonymous Monk on Mar 22, 2011 at 15:20 UTC
Skipping lines based on '@' at the start is dangerous as quality values (line 4) can be '@'!	[reply]


Just another Perl shrine
	PerlMonks