Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl

dimitris852 has asked for the wisdom of the Perl Monks concerning the following question:

The following code allows you to download a protein sequence from Genbank and store it in a fasta file.

use Bio::DB::GenBank;
my $gb=new Bio::DB::GenBank;
my $seq= $gb->get_Seq_by_id ("asdadasda");

write_sequence (">roar.fa", 'fasta',$seq );
[download]

But what if i want to make Perl download all the sequences given a file with protein Ids and then store it to a fasta. I assume that I want to read the sec.txt into an array @lines and I try the following code:

use Bio::DB::GenBank;

open my $handle, '<', 'sec.txt';
my @lines = <$handle>;
close $handle;
print @lines;


my $gb=new Bio::DB::GenBank;
my $seq= $gb->get_Seq_by_id ("@lines");

write_sequence (">roar.fa", 'fasta',$seq );
[download]

The problem is that in my file roar.fa it is still printed the same result as with the first code: Only the first sequence derived from 'asdadasda' .Below is the sec.txt file:

asdadasda
eeeeerrrr
vfffvvvfv
raerrrrrr
[download]

Thanks!

Comment on Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl Select or Download Code

Replies are listed 'Best First'.
Re: Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl by kevbot (Vicar) on Mar 20, 2016 at 22:18 UTC
Hello dimitris852, The documentation for Bio::DB::GenBank states that the `set_Seq_by_id` method takes only one argument which is the id (as a string) of a sequence. It does not appear that you can supply a list of sequences as the argument. So, if you want to write multiple sequences you will need to do something like this: `foreach my $id (@lines) { my $seq = $gb->get_Seq_by_id($id); write_sequence( ">>roar.fa", 'fasta', $seq ); }` [download] Note that I changed the write mode from `>` to `>>` in order to append to the file. So, you should delete any old roar.fa file that may exist before running the script again.	[reply] [d/l] [select]
Re^2: Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl by dimitris852 (Acolyte) on Mar 21, 2016 at 05:09 UTC
hello 1nickt and kevbot, I'm trying this code: `use strict; use warnings; use Bio::DB::GenBank; use File::Slurp; my @lines = read_file('sec.txt'); chomp @lines; print "@lines\n"; foreach my $id (@lines) { my $gb=Bio::DB::GenBank; my $seq = $gb->get_Seq_by_id($id); write_sequence( ">>roar.fa", 'fasta', $seq ); }` [download] Everything seem logic but I get this Error: `Bareword "Bio::DB::GenBank" not allowed while "strict subs" in use at .... line 15.` I have absolutely no idea how to fix it ....	[reply] [d/l] [select]
Re^3: Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl by AnomalousMonk (Archbishop) on Mar 21, 2016 at 05:20 UTC
Line 15 of the code `my $gb=Bio::DB::GenBank;` should be `my $gb = new Bio::DB::GenBank;` (indirect object notation), or better yet `my $gb = Bio::DB::GenBank->new;` or `my $gb = Bio::DB::GenBank->new();` which avoid the syntactic ambiguities of indirect object notation. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^4: Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl by dimitris852 (Acolyte) on Mar 21, 2016 at 05:28 UTC
Re^5: Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl by 1nickt (Canon) on Mar 21, 2016 at 05:33 UTC
Some notes below your chosen depth have not been shown here
Re^5: Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl by kevbot (Vicar) on Mar 21, 2016 at 05:39 UTC
Re^5: Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl by AnomalousMonk (Archbishop) on Mar 21, 2016 at 05:36 UTC
Re^3: Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl by 1nickt (Canon) on Mar 21, 2016 at 05:27 UTC
Sure you do. If something stops working, go back and look at what worked. Your OP had: `my $gb=new Bio::DB::GenBank;` [download] Now this is actually deprecated syntax and should be written as: `my $gb = Bio::DB::GenBank->new();` [download] ... but either style shows the point: you need to call the new() method to get an instance of the class Bio::DB::GenBank, aka, an object, and store it in `$gb`. Your existing code just assigns the bareword 'Bio::DB::GenBank' to `$gb` -- or tries to, but barewords are not allowed in your script because you're sensibly using `strict`. This is a great example of why to use `strict`: it tells you exactly what the problem is and where to find it. Also, don't use File::Slurp, it's broken. Use File::Slurper or Path::Tiny. Hope this helps! The way forward always starts with a minimal test.	[reply] [d/l] [select]
Re^3: Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl by kevbot (Vicar) on Mar 21, 2016 at 05:26 UTC
Change this: `my $gb=Bio::DB::GenBank;` [download] to this: `my $gb = new Bio::DB::GenBank;` [download] and see if that gets things working for you.	[reply] [d/l] [select]
Re: Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl by 1nickt (Canon) on Mar 20, 2016 at 22:07 UTC
Hi dimitris852, Just a guess: I haven't examined the doc for Bio::DB::GenBank::get_Seq_by_id() as I am sure you must have done, but by its name I would think that it only expects one ID at a time. You're trying to pass a list which wouldn't work. But ... ... you're actually passing a string with multiple IDs separated by newlines, because you're not calling chomp on your lines when you get them from your filehandle, and because you're passing the array in double quotes, which concatenates it to a string with elements separated by the current value of `$,` the list separator. Just loop through your lines: `my $gb=new Bio::DB::GenBank; open my $handle, '<', 'sec.txt' or die !$; # check for success while ( my $line = <$handle> ) { # read one line at a time chomp $line; # strip new line chars print "Doing $line\n"; my $seq = $gb->get_Seq_by_id( $line ); write_sequence (">>roar.fa", 'fasta',$seq ); # ^^ # ( assuming write_sequence() supports appending ) } close $handle or die $!;` [download] Hope this helps! Edit: updated to use `while()` The way forward always starts with a minimal test.	[reply] [d/l] [select]
Re: Get Fasta file with Protein Sequences given a file with Genbank Ids using Perl by Cristoforo (Curate) on Mar 20, 2016 at 23:22 UTC
Looking at the documentation for Bio::DB::GenBank, it states `DESCRIPTION Allows the dynamic retrieval of the Bio::Seq manpage sequence objects +from the GenBank database at NCBI, via an Entrez query. WARNING: Please do NOT spam the Entrez web server with multiple reques +ts. NCBI offers Batch Entrez for this purpose.` [download] So I don't know if this applies to your situation because you're only downloading 4 sequences. The page for NCBI has some examples of downloads.	[reply] [d/l]