dimitris852 has asked for the wisdom of the Perl Monks concerning the following question:

The following code allows you to download a protein sequence from Genbank and store it in a fasta file.

use Bio::DB::GenBank; my $gb=new Bio::DB::GenBank; my $seq= $gb->get_Seq_by_id ("asdadasda"); write_sequence (">roar.fa", 'fasta',$seq );

But what if i want to make Perl download all the sequences given a file with protein Ids and then store it to a fasta. I assume that I want to read the  sec.txt into an array @lines and I try the following code:

use Bio::DB::GenBank; open my $handle, '<', 'sec.txt'; my @lines = <$handle>; close $handle; print @lines; my $gb=new Bio::DB::GenBank; my $seq= $gb->get_Seq_by_id ("@lines"); write_sequence (">roar.fa", 'fasta',$seq );
The problem is that in my file roar.fa it is still printed the same result as with the first code: Only the first sequence derived from 'asdadasda' .Below is the sec.txt file:

asdadasda eeeeerrrr vfffvvvfv raerrrrrr

Thanks!

Replies are listed 'Best First'.
Re: Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl
by kevbot (Vicar) on Mar 20, 2016 at 22:18 UTC
    Hello dimitris852,

    The documentation for Bio::DB::GenBank states that the set_Seq_by_id method takes only one argument which is the id (as a string) of a sequence. It does not appear that you can supply a list of sequences as the argument.

    So, if you want to write multiple sequences you will need to do something like this:

    foreach my $id (@lines) { my $seq = $gb->get_Seq_by_id($id); write_sequence( ">>roar.fa", 'fasta', $seq ); }
    Note that I changed the write mode from > to >> in order to append to the file. So, you should delete any old roar.fa file that may exist before running the script again.

      hello 1nickt and kevbot,

      I'm trying this code:
      use strict; use warnings; use Bio::DB::GenBank; use File::Slurp; my @lines = read_file('sec.txt'); chomp @lines; print "@lines\n"; foreach my $id (@lines) { my $gb=Bio::DB::GenBank; my $seq = $gb->get_Seq_by_id($id); write_sequence( ">>roar.fa", 'fasta', $seq ); }
      Everything seem logic but I get this Error: Bareword "Bio::DB::GenBank" not allowed while "strict subs" in use at .... line 15.

      I have absolutely no idea how to fix it ....

        Line 15 of the code
            my $gb=Bio::DB::GenBank;
        should be
            my $gb = new Bio::DB::GenBank;
        (indirect object notation), or better yet
            my $gb = Bio::DB::GenBank->new;
        or
            my $gb = Bio::DB::GenBank->new();
        which avoid the syntactic ambiguities of indirect object notation.


        Give a man a fish:  <%-{-{-{-<

        Sure you do.

        If something stops working, go back and look at what worked. Your OP had:

        my $gb=new Bio::DB::GenBank;
        Now this is actually deprecated syntax and should be written as:
        my $gb = Bio::DB::GenBank->new();
        ... but either style shows the point: you need to call the new() method to get an instance of the class Bio::DB::GenBank, aka, an object, and store it in $gb. Your existing code just assigns the bareword 'Bio::DB::GenBank' to $gb -- or tries to, but barewords are not allowed in your script because you're sensibly using strict. This is a great example of why to use strict: it tells you exactly what the problem is and where to find it.

        Also, don't use File::Slurp, it's broken. Use File::Slurper or Path::Tiny.

        Hope this helps!


        The way forward always starts with a minimal test.
        Change this:
        my $gb=Bio::DB::GenBank;
        to this:
        my $gb = new Bio::DB::GenBank;
        and see if that gets things working for you.
Re: Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl
by 1nickt (Canon) on Mar 20, 2016 at 22:07 UTC

    Hi dimitris852,

    Just a guess: I haven't examined the doc for Bio::DB::GenBank::get_Seq_by_id() as I am sure you must have done, but by its name I would think that it only expects one ID at a time. You're trying to pass a list which wouldn't work. But ...

    ... you're actually passing a string with multiple IDs separated by newlines, because you're not calling chomp on your lines when you get them from your filehandle, and because you're passing the array in double quotes, which concatenates it to a string with elements separated by the current value of $, the list separator.

    Just loop through your lines:

    my $gb=new Bio::DB::GenBank; open my $handle, '<', 'sec.txt' or die !$; # check for success while ( my $line = <$handle> ) { # read one line at a time chomp $line; # strip new line chars print "Doing $line\n"; my $seq = $gb->get_Seq_by_id( $line ); write_sequence (">>roar.fa", 'fasta',$seq ); # ^^ # ( assuming write_sequence() supports appending ) } close $handle or die $!;

    Hope this helps!


    Edit: updated to use while()
    The way forward always starts with a minimal test.
Re: Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl
by Cristoforo (Curate) on Mar 20, 2016 at 23:22 UTC
    Looking at the documentation for Bio::DB::GenBank, it states
    DESCRIPTION Allows the dynamic retrieval of the Bio::Seq manpage sequence objects +from the GenBank database at NCBI, via an Entrez query. WARNING: Please do NOT spam the Entrez web server with multiple reques +ts. NCBI offers Batch Entrez for this purpose.
    So I don't know if this applies to your situation because you're only downloading 4 sequences. The page for NCBI has some examples of downloads.