adansonia has asked for the wisdom of the Perl Monks concerning the following question:

Hi! I have a huge text file (around 10GB) of DNA sequences in the following format which is composed by four lines (the DNA sequence is always in the line before the "+" sing):

@HWI-ST591:68:D0DBPABXX:5:1101:1197:2084 1:N:0:

GGTAGTTCGACCGTGGAT

+

B@@FFEFFHDHHFHIJJE

@HWI-ST591:68:D0DBPABXX:5:1101:1086:2085 1:N:0:

GCTGGAACTTGGCAAAGAAGAGAG

+

@@@FFEFFGHHHH@@FEHBEHJGG

I need to write a perl script to read the sequence lines and print their length which can vary between 0 and 100 positions. My main problem is that I don't know how to write in perl the command line to read each four lines and calculate the length of the second one. I would be very grateful if someone could help me! Thanks!

Replies are listed 'Best First'.
Re: read nth lines in a text file
by toolic (Bishop) on Aug 30, 2011 at 21:05 UTC
    Judging by your data, it looks like you only need to keep track of one line (not 4): the line above the +.
    use warnings; use strict; my $prev; while (<DATA>) { chomp; if (/\+/) { print length($prev), "\n"; } $prev = $_; } __DATA__ @HWI-ST591:68:D0DBPABXX:5:1101:1197:2084 1:N:0: GGTAGTTCGACCGTGGAT + B@@FFEFFHDHHFHIJJE @HWI-ST591:68:D0DBPABXX:5:1101:1086:2085 1:N:0: GCTGGAACTTGGCAAAGAAGAGAG + @@@FFEFFGHHHH@@FEHBEHJGG
    prints...
    18 24

    See also:

      Thanks so much! that was just what I wanted!
Re: read nth lines in a text file
by BrowserUk (Patriarch) on Aug 30, 2011 at 21:01 UTC
    format which is composed by four lines

    In your example, the first group consists of only 3 lines, the second 4, and the last 1. So which is it?

    I commend you for posting a sample, but you need to make it sufficient (and accurate enough) that we can get an idea of the reality. So, how about you post a couple more (complete sets of) records. And this time please use <code></code> tags.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: read nth lines in a text file
by JavaFan (Canon) on Aug 30, 2011 at 20:59 UTC
    Untested:
    perl -nlwe 'print $l if $_ eq "+"; $l = length' your-file