newtoperlprog has asked for the wisdom of the Perl Monks concerning the following question:
Dear All,
I am trying to parse a file in a while loop and printing some matched regular expression parameters.
Below is my code and data file
my $filename = test.summary"; open (IN, "<", $filename) or die "Check the summary file. $!\n"; while (my $line = <IN>) { chomp $line; if ($line =~/^LOCUS\s+\w+\d+\s+(\d+)\sbp/) { $gene_length = $1; } if ($line =~/^DEFINITION\s+(.*)/s) { $definition = $1; } if ($line =~/^ACCESSION\s+(.*?)\s+/) { $accession = $1; } if ($line =~ /\s+\/db_xref="GI\:(\d+)\"/) { $gi_number = $1; } if ($line =~ /\s+\/db_xref=\"GeneID\:(\d+)\"/) { $gene_id = $1; } }
Data file: LOCUS NM_001098209 3415 bp mRNA linear PRI 27 +-APR-2014 DEFINITION Homo sapiens catenin (cadherin-associated protein), beta 1 +, 88kDa (CTNNB1), transcript variant 2, mRNA. ACCESSION NM_001098209 XM_001133660 XM_001133664 XM_001133673 XM_001 +133675 VERSION NM_001098209.1 GI:148233337 KEYWORDS RefSeq. SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Eutele +ostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhin +i; Catarrhini; Hominidae; Homo. CDS 269..2614 /gene="CTNNB1" /gene_synonym="armadillo; CTNNB; MRD19" /codon_start=1 /product="catenin beta-1" /protein_id="NP_001091679.1" /db_xref="GI:148233338" /db_xref="CCDS:CCDS2694.1" /db_xref="GeneID:1499" /db_xref="HGNC:HGNC:2514" /db_xref="MIM:116806" /translation="MATQADLMELDMAMEPDRKAAVSHWQQQSYLDSGI +HSGATTTAP SLSGKGNPEEEDVDTSQVLYEWEQGFSQSFTQEQVADIDGQYAMTRAQR +VRAAMFPET LDEGMQIPSTQFDAAHPTNVQRLAEPSQMLKHAVVNLINYQDDAELATR +AIPELTKLL //
My questions:
a) How can I parse the multiline DEFINITION in the while loop as the regular expression captures only the first line .
b) Could I get some help in capuring the content of CDS block and then parse individual entries one by one( like GI, GeneID etc.).
I am trying to learn using Perl only so I am not using the BioPerl module for the above purpose.
Regards
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: multiline in while loop and regular expression
by GrandFather (Saint) on Nov 24, 2014 at 22:21 UTC | |
by newtoperlprog (Sexton) on Nov 24, 2014 at 22:33 UTC | |
by GrandFather (Saint) on Nov 24, 2014 at 23:58 UTC | |
|
Re: multiline in while loop and regular expression
by ww (Archbishop) on Nov 24, 2014 at 23:14 UTC | |
|
Re: multiline in while loop and regular expression
by Anonymous Monk on Nov 24, 2014 at 22:32 UTC |