in reply to Re: Parsing help
in thread Parsing help

Very elegant solution, but may has the drawback of having to load the whole file in memory and traverse the whole list 3 times (1 - reading, 2 - filling the field #5 and 3 - printing).

Since the size of this kind of genomic files may be an issue, here is another version a bit more resource-friendly:

use strict; use warnings; my @acc = (); while (<DATA>) { my @recs = split; push @acc, [@recs]; if (my $geneName = $recs[5]) { print join ("\t", @{$_}[0 .. 4], $geneName, "\n") for @acc; @acc = (); } } __DATA__ NT_113797 CDS 122829 123323 - gene=LOC644591 ProteinID=X +P_932799.1 NT_113798 CDS 4457 4636 - NT_077932 CDS 9894 9928 - NT_077932 CDS 65297 65828 + NT_077932 CDS 89196 89690 - gene=LOC653505 ProteinID=BJD +ND993

Outputs the desired result:

NT_113797 CDS 122829 123323 - gene=LOC644591 NT_113798 CDS 4457 4636 - gene=LOC653505 NT_077932 CDS 9894 9928 - gene=LOC653505 NT_077932 CDS 65297 65828 + gene=LOC653505 NT_077932 CDS 89196 89690 - gene=LOC653505 p

Hope this helps

citromatik