in reply to Re: Parsing help
in thread Parsing help
Very elegant solution, but may has the drawback of having to load the whole file in memory and traverse the whole list 3 times (1 - reading, 2 - filling the field #5 and 3 - printing).
Since the size of this kind of genomic files may be an issue, here is another version a bit more resource-friendly:
use strict; use warnings; my @acc = (); while (<DATA>) { my @recs = split; push @acc, [@recs]; if (my $geneName = $recs[5]) { print join ("\t", @{$_}[0 .. 4], $geneName, "\n") for @acc; @acc = (); } } __DATA__ NT_113797 CDS 122829 123323 - gene=LOC644591 ProteinID=X +P_932799.1 NT_113798 CDS 4457 4636 - NT_077932 CDS 9894 9928 - NT_077932 CDS 65297 65828 + NT_077932 CDS 89196 89690 - gene=LOC653505 ProteinID=BJD +ND993
Outputs the desired result:
NT_113797 CDS 122829 123323 - gene=LOC644591 NT_113798 CDS 4457 4636 - gene=LOC653505 NT_077932 CDS 9894 9928 - gene=LOC653505 NT_077932 CDS 65297 65828 + gene=LOC653505 NT_077932 CDS 89196 89690 - gene=LOC653505 p
Hope this helps
citromatik
|
|---|