in reply to Parsing help

Since it's not clear from your sample data where the tabs ought be (and it doesn't matter for demonstration purposes anyway) I've changed the sample code to use spaces instead and changed it to get input from the __DATA__ section:

use strict; use warnings; my @partsList; push @partsList, [split] while <DATA>; my $geneName = ''; $geneName = $_->[5] ||= $geneName for reverse @partsList; print join ("\t", @{$_}[0 .. 5]), "\n" for @partsList; __DATA__ NT_113797 CDS 122829 123323 - gene=LOC644591 ProteinID=X +P_932799.1 NT_113798 CDS 4457 4636 - NT_077932 CDS 9894 9928 - NT_077932 CDS 65297 65828 + NT_077932 CDS 89196 89690 - gene=LOC653505 ProteinID=BJD +ND993

Prints:

NT_113797 CDS 122829 123323 - gene=LOC644591 NT_113798 CDS 4457 4636 - gene=LOC653505 NT_077932 CDS 9894 9928 - gene=LOC653505 NT_077932 CDS 65297 65828 + gene=LOC653505 NT_077932 CDS 89196 89690 - gene=LOC653505

True laziness is hard work

Replies are listed 'Best First'.
Re^2: Parsing help
by citromatik (Curate) on Apr 01, 2009 at 09:08 UTC

    Very elegant solution, but may has the drawback of having to load the whole file in memory and traverse the whole list 3 times (1 - reading, 2 - filling the field #5 and 3 - printing).

    Since the size of this kind of genomic files may be an issue, here is another version a bit more resource-friendly:

    use strict; use warnings; my @acc = (); while (<DATA>) { my @recs = split; push @acc, [@recs]; if (my $geneName = $recs[5]) { print join ("\t", @{$_}[0 .. 4], $geneName, "\n") for @acc; @acc = (); } } __DATA__ NT_113797 CDS 122829 123323 - gene=LOC644591 ProteinID=X +P_932799.1 NT_113798 CDS 4457 4636 - NT_077932 CDS 9894 9928 - NT_077932 CDS 65297 65828 + NT_077932 CDS 89196 89690 - gene=LOC653505 ProteinID=BJD +ND993

    Outputs the desired result:

    NT_113797 CDS 122829 123323 - gene=LOC644591 NT_113798 CDS 4457 4636 - gene=LOC653505 NT_077932 CDS 9894 9928 - gene=LOC653505 NT_077932 CDS 65297 65828 + gene=LOC653505 NT_077932 CDS 89196 89690 - gene=LOC653505 p

    Hope this helps

    citromatik