Hi Perlmonks,
I am interested in retrieving the coding sequence (cds) of a gene using accession number in a perl script. I searched for a script in the web and found a script that retrieves the entire sequence in FASTA format (not the cds directly). In fact, there is a hyperlink at the word CDS in the particular GenBank page of NCBI and when the link is clicked it shows the cds of the gene. Is it possible to get the cds sequence directly using a perl script and internet connectivity? I welcome suggestions from the perlmonks. Here goes the script:
######################################################### # Perl code to get complete sequence in FASTA format from GenBank page ######################################################### #!/usr/bin/perl use warnings; use strict; use Bio::DB::GenBank; use Bio::SeqIO; my $gb =new Bio::DB::GenBank; my $acc="NM_021817"; my $seq1=$gb->get_Seq_by_acc($acc); my $sequence=$seq1->seq(); my $description=$seq1->desc(); print "\n $description $sequence\n"; exit;
The script produces the following output in cmd screen:
C:\Users\Desktop>seq.pl Homo sapiens hyaluronan and proteoglycan link protein 2 (HAPLN2), mRNA +. TTGCCATTACCCACAGAATAAAGAAAGGGGCCCTGTTATTCAACAATAGGGGAAAAGACAGAGACAATGG +GAAATTGTGC TTCCGATGGGGTGGGGACTGAGAAGGAAAGGACAGACAGACAGACAGACAGGGGGTTGTACAGAAGAGGT +CCGGTTTCTT GAAGCAGCTGGAAGTCCTGGATAGTTCCCACCTGAAAGTCTGTTTGCAAAGGCAATGCGCACTCAGGCAC +CAGAGGGCAG AGGGGCTCAAGTTCCAGGGTTTTAAGGTGCTTGGAACTCCCAGGAGCCTGGCAAACCTTCATCCAGAACC +TCTTCCTCAA GCAAGACAAAAAGCTGCTAAGCACTGCTCCCTCCGTCTCTGTGAAGAGACCAGCTTCTAACAGACGGTGC +CGGGCTGACC CCCCATCATGCCAGGCTGGCTCACCCTCCCCACACTCTGCCGCTTCCTTCTTTGGGCCTTCACCATCTTC +CACAAAGCCC AAGGAGACCCAGCATCCCACCCGGGCCCCCACTACCTCCTGCCCCCCATCCACGAGGTCATTCACTCTCA +TCGTGGGGCC ACGGCCACGCTGCCCTGCGTCCTGGGCACCACGCCTCCCAGCTACAAGGTGCGCTGGAGCAAGGTGGAGC +CTGGGGAGCT CCGGGAAACGCTGATCCTCATCACCAACGGACTGCACGCCCGGGGGTATGGGCCCCTGGGAGGGCGCGCC +AGGATGCGGA GGGGGCATCGACTAGACGCCTCCCTGGTCATCGCGGGCGTGCGCCTGGAGGACGAGGGCCGGTACCGCTG +CGAGCTCATC AACGGCATCGAGGACGAGAGCGTGGCGCTGACCTTGAGCTTGGAGGGTGTGGTGTTTCCGTACCAACCCA +GCCGGGGCCG GTACCAGTTCAATTACTACGAGGCGAAGCAGGCGTGCGAGGAGCAGGACGGACGCCTGGCCACCTACTCC +CAGCTCTACC AGGCTTGGACCGAGGGTCTGGACTGGTGTAACGCGGGCTGGCTGCTCGAGGGCTCCGTGCGCTACCCTGT +GCTCACCGCA CGCGCCCCGTGCGGCGGCCGAGGCCGGCCCGGGATCCGCAGCTACGGACCCCGCGACCGGATGCGCGACC +GCTACGACGC CTTCTGCTTCACCTCCGCGCTGGCGGGCCAAGTGTTCTTCGTGCCCGGGCGGCTGACGCTGTCTGAAGCC +CACGCGGCGT GCCGGCGACGCGGCGCCGTGGTGGCCAAGGTTGGGCACCTCTACGCCGCCTGGAAGTTTTCGGGGCTAGA +CCAGTGCGAC GGCGGCTGGCTGGCTGACGGCAGTGTGCGCTTCCCAATCACCACGCCGAGGCCGCGCTGCGGGGGGCTCC +CGGATCCCGG AGTGCGCAGTTTCGGCTTCCCCAGGCCCCAACAGGCAGCCTATGGGACCTACTGCTACGCCGAGAATTAG +GCGCCCACCG TGTCCCCTCCAGCGCGCGCGAAGAAGCTTGGGAGTCGTGGCGGGGGTCTCTCGCCACCCCTTTCCGGAGA +GCCTCCCCTC CCTCCAGACCCGGAGCGGCCTCTCCAGACCTGCCTTCCCAGCCGGGGGCTGCGGGCCTCGGACCCCGGCT +GGCCCGGCGG CGGGGAGGGGAGGCGGGGGCGCCTCCGGCGGCGAGATGCAGAGGTGACCCTCGGACCCGCTGCCGTTCGC +GAACCCTAGC AGAGGACTCAGCCACCGCCGGGGGGAGGGTGAGGCGGCCGGGGGCATTAACTGACCTCTGAGTACAGCAA +TAAAATAACC TGGGGATCTTTAAAAAAAAAAAAAAAAAAAAAAAA C:\Users\Desktop>
I would like to get the cds sequence as given below:
>NM_021817.2:408-1430 Homo sapiens hyaluronan and proteoglycan link p +rotein 2 (HAPLN2), mRNA ATGCCAGGCTGGCTCACCCTCCCCACACTCTGCCGCTTCCTTCTTTGGGCCTTCACCATCTTCCACAAAG CCCAAGGAGACCCAGCATCCCACCCGGGCCCCCACTACCTCCTGCCCCCCATCCACGAGGTCATTCACTC TCATCGTGGGGCCACGGCCACGCTGCCCTGCGTCCTGGGCACCACGCCTCCCAGCTACAAGGTGCGCTGG AGCAAGGTGGAGCCTGGGGAGCTCCGGGAAACGCTGATCCTCATCACCAACGGACTGCACGCCCGGGGGT ATGGGCCCCTGGGAGGGCGCGCCAGGATGCGGAGGGGGCATCGACTAGACGCCTCCCTGGTCATCGCGGG CGTGCGCCTGGAGGACGAGGGCCGGTACCGCTGCGAGCTCATCAACGGCATCGAGGACGAGAGCGTGGCG CTGACCTTGAGCTTGGAGGGTGTGGTGTTTCCGTACCAACCCAGCCGGGGCCGGTACCAGTTCAATTACT ACGAGGCGAAGCAGGCGTGCGAGGAGCAGGACGGACGCCTGGCCACCTACTCCCAGCTCTACCAGGCTTG GACCGAGGGTCTGGACTGGTGTAACGCGGGCTGGCTGCTCGAGGGCTCCGTGCGCTACCCTGTGCTCACC GCACGCGCCCCGTGCGGCGGCCGAGGCCGGCCCGGGATCCGCAGCTACGGACCCCGCGACCGGATGCGCG ACCGCTACGACGCCTTCTGCTTCACCTCCGCGCTGGCGGGCCAAGTGTTCTTCGTGCCCGGGCGGCTGAC GCTGTCTGAAGCCCACGCGGCGTGCCGGCGACGCGGCGCCGTGGTGGCCAAGGTTGGGCACCTCTACGCC GCCTGGAAGTTTTCGGGGCTAGACCAGTGCGACGGCGGCTGGCTGGCTGACGGCAGTGTGCGCTTCCCAA TCACCACGCCGAGGCCGCGCTGCGGGGGGCTCCCGGATCCCGGAGTGCGCAGTTTCGGCTTCCCCAGGCC CCAACAGGCAGCCTATGGGACCTACTGCTACGCCGAGAATTAG
In reply to Is it possible to retrieve the coding sequence of a gene from NCBI GenBank database using perl ? by supriyoch_2008
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |