cowboyrocks has asked for the wisdom of the Perl Monks concerning the following question:
My output looks like this:-NT_113797 CDS 122829 123323 - gene=LOC644591 ProteinID=X +P_932799.1 NT_113798 CDS 4457 4636 - NT_077932 CDS 9894 9928 - NT_077932 CDS 65297 65828 + NT_077932 CDS 89196 89690 - gene=LOC653505 ProteinID=BJD +ND993
I want it to be like this:-NT_113797 CDS 122829 123323 - NT_113798 CDS 4457 4636 - gene=LOC644591 NT_077932 CDS 9894 9928 - gene=LOC644591 NT_077932 CDS 65297 65828 + gene=LOC644591 NT_077932 CDS 89196 89690 - gene=LOC644591
My code looks something like this:-NT_113797 CDS 122829 123323 - gene=LOC644591 NT_113798 CDS 4457 4636 - gene=LOC653505 NT_077932 CDS 9894 9928 - gene=LOC653505 NT_077932 CDS 65297 65828 + gene=LOC653505 NT_077932 CDS 89196 89690 - gene=LOC653505
Thanks in advance cowboy :-)#!/usr/bin/perl use warnings; use strict; my $fn = $ARGV[0]; open(FH, "$fn") || die("cannot open:$!"); { my $geneName = ""; while(<FH>) { if($_ =~ /\A(\S+)\t(\S+)\t(\d+)\t(\d+)\t(\S)\s+$/) { print "\n$_ $geneName"; } if($_ =~ /\A(\S+)\t(\S+)\t(\d+)\t(\d+)\t(\S)\s+(\S+)\s+(\S+)\s+ +/) { $geneName = $6; } } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parsing help
by GrandFather (Saint) on Apr 01, 2009 at 04:04 UTC | |
by citromatik (Curate) on Apr 01, 2009 at 09:08 UTC | |
|
Re: Parsing help
by Anonymous Monk on Apr 01, 2009 at 03:39 UTC |