wirlleyd has asked for the wisdom of the Perl Monks concerning the following question:

Hi i'm a begginer in Perl and my script don't return a correct value, please help me. Example:The id 16 should return me to the species " Homo sapiens chromosome 16 , GRCh38.p2 Primary Assembly " and nucleotide sequence.

#!/usr/bin/perl use strict; use warnings; use Bio::DB::GenBank; use Bio::SeqIO; my $output = Bio::SeqIO->new(-format=>"fasta",-file=>">output.fasta"); my $gb = Bio::DB::GenBank->new(); while(<STDIN>){ chomp; my $id = $_; my $seq = $gb->get_Seq_by_id($id); $output->write_seq($seq); }

Replies are listed 'Best First'.
Re: BioPerl sequences
by kcott (Archbishop) on Jun 25, 2015 at 04:55 UTC

    G'day wirlleyd,

    Welcome to the Monastery.

    While it is good that you have identified your expected output, it is also important to show your actual output. The things to include with your post, such that we can more readily help you, are described in these guidelines.

    From your description, it would appear $id is 16; however, the documentation for get_Seq_by_id shows this:

    Usage : $seq = $db->get_Seq_by_id('ROA1_HUMAN')

    So, perhaps your $id should be some string (rather than an integer).

    That same piece of documentation also shows:

    Throws : "id does not exist" exception

    I recommend you test for all potential exceptions; especially when the documentation alerts you to their existence.

    -- Ken

Re: BioPerl sequences
by kevbot (Vicar) on Jun 25, 2015 at 05:36 UTC

    Why do you expect the output to be Homo sapiens chromosome 16 , GRCh38.p2 Primary Assembly?

    When I run your script and enter 16 for the id, I get the following in the first line of the output.fasta file.

    Blue Whale common cetacean component DNA.

    If I execute a Nucleotide search for 16 at this Genbank page http://www.ncbi.nlm.nih.gov/nuccore/, I get output that matches that of the output.fasta file (see http://www.ncbi.nlm.nih.gov/nuccore/16).

    It appears that your code is performing a Genbank nucleotide search, when you may want it to perform a gene search. If I go back to http://www.ncbi.nlm.nih.gov/nuccore and change the database in the drop-down to Gene, then I get results that match your desired output (see http://www.ncbi.nlm.nih.gov/gene/?term=16).

    You may need to adjust your code to search the correct/desired Genbank database. I have never used this module before, so I don't know how to do that at the moment. Perhaps the documentation for these modules will be helpful: Bio::DB::Genbank and Bio::DB::Query::GenBank

    UPDATE: Perhaps nucleotide is the correct database. See my other post in this thread Re: BioPerl sequences.

Re: BioPerl sequences
by erix (Prior) on Jun 25, 2015 at 06:38 UTC

    while(<STDIN>){

    I think with this while loop without any wait-stage in it, you might get blocked by NCBI. Make sure to follow their rules and introduce a sufficiently long sleep in each iteration - I think they want 3 seconds. (looks like the -delay parameter of Bio::DB::GenBank would do the same)

Re: BioPerl sequences
by kevbot (Vicar) on Jun 26, 2015 at 04:59 UTC

    I'm going to take a guess, and say that while you may be searching the correct database (i.e. nucleotide), you are likely using the wrong id to perform your search.

    Using an id of NC_000016, results in fetching sequence data with the annotation that you expect.

    The code below will result in a fasta file that contains NC_000016 Homo sapiens chromosome 16, GRCh38.p2 Primary Assembly in the header. It may take a while for the script to finish, since this is a big sequence. The output.fasta file is around 100 MB.

    #!/usr/bin/perl use strict; use warnings; use Bio::DB::GenBank; use Bio::SeqIO; my $output = Bio::SeqIO->new(-format=>"fasta",-file=>">output.fasta"); my $gb = Bio::DB::GenBank->new(); my $id = 'NC_000016'; my $seq = $gb->get_Seq_by_id($id); $output->write_seq($seq); exit;
Re: BioPerl sequences
by wirlleyd (Initiate) on Jun 25, 2015 at 18:18 UTC
    I no have any idea how do i specify for my scrip get from Gene in this site http://www.ncbi.nlm.nih.gov/nuccore. Someone know how i do? Sorry for my english very bad , i'm from Brazil.