yantul has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I need general help with writing a script that go to NCBI site and gets genbank/FASTA format file of some proteins I'm interested in. The list of the genbank IDs (i.e ABK52390.1) I will get from a database site (http://www.cazy.org/GH48_all.html), haven't done it but don't think it will be a problem to get the IDs list. I tried to find any pattern with the URL of the files I'm interesting but found nothing. Is there any way to go with ID list and extract this data from NCBI??? What I need to begin is some function or modules to do it. Thank you very much Yantul

Replies are listed 'Best First'.
Re: New with Bioinformatics
by planetscape (Chancellor) on Mar 07, 2010 at 08:40 UTC
Re: New with Bioinformatics
by space_agent (Acolyte) on Mar 07, 2010 at 18:20 UTC

    I would suggest to look at the Bioperl HOWTO , it helped me a lot.
    Here is an example code you could use to download some sequences from GenBank
    and print them in multifasta output:

    #!/usr/bin/perl -w use Bio::DB::GenBank; $db_obj = Bio::DB::GenBank->new; for $seq (AJ866941,DQ445088) { $seq_obj = $db_obj->get_Seq_by_acc($seq); print ">",$seq_obj->display_id,"\n", $seq_obj->seq,"\n"; }