Re: Read, match string and print

Uff, there is an elegant mistake. In the data;

 mRNA            join(68351..68408,76646..77296)
                     /gene="DEFB125"
                     /product="defensin, beta 125"
                     /note="Derived by automated computational analysi
+s using
                     gene prediction method: BestRefseq."
                     /transcript_id="NM_153325.2"
                     /db_xref="GI:76563935"
                     /db_xref="GeneID:245938"
                     /db_xref="HGNC:18105"
     CDS             join(68351..68408,76646..77058)
                     /gene="DEFB125"
                     /note="Derived by automated computational analysi
+s using
                     gene prediction method: BestRefseq."
                     /codon_start=1
                     /product="defensin, beta 125 preproprotein prepro
+protein"
                     /protein_id="NP_697020.2"
                     /db_xref="GI:76563936"
                     /db_xref="CCDS:CCDS12989.2"
                     /db_xref="GeneID:245938"
                     /db_xref="HGNC:18105"
     gene            123252..126392
                     /gene="DEFB126"
                     /note="Derived by automated computational analysi
+s using
                     gene prediction method: BestRefseq."
                     /db_xref="GeneID:81623"
                     /db_xref="HGNC:15900"
     mRNA            join(123252..123327,126056..126392)
                     /gene="DEFB126"
                     /product="defensin, beta 126"
                     /note="Derived by automated computational analysi
+s using
                     gene prediction method: BestRefseq."
                     /transcript_id="NM_030931.2"
                     /db_xref="GI:30061484"
                     /db_xref="GeneID:81623"
                     /db_xref="HGNC:15900"
     CDS             join(123270..123327,126056..126333)
                     /gene="DEFB126"
                     /note="Derived by automated computational analysi
+s using
                     gene prediction method: BestRefseq."
                     /codon_start=1
                     /product="defensin, beta 126 preproprotein prepro
+protein"
                     /protein_id="NP_112193.1"
                     /db_xref="GI:13624333"
                     /db_xref="CCDS:CCDS12990.1"
                     /db_xref="GeneID:81623"
                     /db_xref="HGNC:15900"
[download]

In the code above, it does not care whether db_xref="GI comes from mRNA or CDS. But I need to get the GI value for CDS - not mRNA! I thought the following (a slight modification for the matching pattern of CDS=>GI) would fix this problem but it did not. Any suggestion is welcome.

#!/usr/bin/perl
use strict;
use warnings;

my $data = '/DATA/GenBankFile.gb'; # GenBank file is located at C:\DAT
+A
open INFILE, '<', $data or die "Please insert a new coin!\n";

my ( $cds, $gi, $version ); # Declaration of variables

while ( <INFILE> ) {
    last if m!//$!; # Mark the end-of-file
    if ( /^VERSION.*\w:(\d+)/ ) { # Extract GI of the entire file
        $version = $1;
        }
    elsif ( /^\s*CDS\(s+\S+\n+\)*\Sdb_xref="GI:(\d+)/ ) { # Extract th
+e protein gi
        $gi = $1;
        }
    elsif ( /^\s*CDS\s*(\S+)/ ) { # Extract the annotation 
        $cds = $1;
        }
    if ( defined $cds && defined $gi && defined $version ) { # Print o
+nly when all variables are defined
        print "$gi\t$version\t$cds\n";
        $gi = $cds = undef; # Get ready for the next loop
        }
    }

close INFILE;
[download]

Comment on Re: Read, match string and print Select or Download Code

Replies are listed 'Best First'.
Re^2: Read, match string and print by BioLion (Curate) on Feb 08, 2010 at 14:58 UTC
This looks like a standard format - ( examples and more ) - genbank? Is there a parser available from BioPerl or cpan? Sorry if someone has already asked this!! Just a something something...	[reply]
Re^3: Read, match string and print by sophix (Sexton) on Feb 10, 2010 at 14:21 UTC
Hi, I think I stumbled on a GenBank oriented parser (especially, quite useful when you are reading a gene sequence) while reading Perl for Bioinformatics. I`ll check it out, and let you know. Nevertheless, I would like to learn how to use flags to mark the beginning and end of a block to be processed. That is, for the example in this thread, I want the code to make the adequate matchings only when encountered a CDS block (which starts with \sCDS and ends with \sgene) but not a mRNA block. Well, I still did not manage to, though. I understood how it works but could not figure out how to implement. :)	[reply]