New with Bioinformatics

yantul has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I need general help with writing a script that go to NCBI site and gets genbank/FASTA format file of some proteins I'm interested in. The list of the genbank IDs (i.e ABK52390.1) I will get from a database site (http://www.cazy.org/GH48_all.html), haven't done it but don't think it will be a problem to get the IDs list. I tried to find any pattern with the URL of the files I'm interesting but found nothing. Is there any way to go with ID list and extract this data from NCBI??? What I need to begin is some function or modules to do it. Thank you very much Yantul

Comment on New with Bioinformatics

Replies are listed 'Best First'.
Re: New with Bioinformatics by planetscape (Chancellor) on Mar 07, 2010 at 08:40 UTC
I recommend you start with the resources here: Perl and Bioinformatics Please note that not all Monks are BioMonks. Also, your post could use a bit of editing (and perhaps linking). Update: Looks like you could also probably use Web Client Programming. HTH, planetscape	[reply]
Re: New with Bioinformatics by space_agent (Acolyte) on Mar 07, 2010 at 18:20 UTC
I would suggest to look at the Bioperl HOWTO , it helped me a lot. Here is an example code you could use to download some sequences from GenBank and print them in multifasta output: `#!/usr/bin/perl -w use Bio::DB::GenBank; $db_obj = Bio::DB::GenBank->new; for $seq (AJ866941,DQ445088) { $seq_obj = $db_obj->get_Seq_by_acc($seq); print ">",$seq_obj->display_id,"\n", $seq_obj->seq,"\n"; }` [download]	[reply] [d/l]