Re^2: Parsing help

Very elegant solution, but may has the drawback of having to load the whole file in memory and traverse the whole list 3 times (1 - reading, 2 - filling the field #5 and 3 - printing).

Since the size of this kind of genomic files may be an issue, here is another version a bit more resource-friendly:

use strict;
use warnings;

my @acc = ();
while (<DATA>) {
  my @recs = split;
  push @acc, [@recs];
  if (my $geneName = $recs[5]) {
    print join ("\t", @{$_}[0 .. 4], $geneName, "\n") for @acc;
    @acc = ();
  }
}


__DATA__
NT_113797    CDS    122829    123323    -  gene=LOC644591  ProteinID=X
+P_932799.1  
NT_113798    CDS    4457    4636    -
NT_077932    CDS    9894    9928    -
NT_077932    CDS    65297    65828    +
NT_077932    CDS    89196    89690    -  gene=LOC653505  ProteinID=BJD
+ND993
[download]

Outputs the desired result:

NT_113797       CDS     122829  123323  -       gene=LOC644591
NT_113798       CDS     4457    4636    -       gene=LOC653505
NT_077932       CDS     9894    9928    -       gene=LOC653505
NT_077932       CDS     65297   65828   +       gene=LOC653505
NT_077932       CDS     89196   89690   -       gene=LOC653505
p
[download]

Hope this helps

citromatik

Comment on Re^2: Parsing help Select or Download Code