Getting data from NCBI

Becky has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I've been getting protein sequences from the NCBI for years using code like this:

  
use LWP;
use HTTP::Request::Common;

my $ua = new LWP::UserAgent;
    
my $result = $ua->request(GET 
"http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&qty=1&start
+=1&list_uids=148261691&dopt=fasta");

if($result->is_success){
    $fasta = $result->content;
}

# then parse $fasta to get out just my protein sequence
[download]

However, this no longer works as the NCBI have started using javascript. If you copy in the url above and view the page source you'll see what I mean - the sequence is no longer visible in the page source. The sequence should start

>gi|148261691|ref|YP_001235818.1| CheA signal transduction histidine kinase Acidiphilium cryptum JF-5 MTGGGSMDPMAEIRETFFQECEEQLAELESGLMRMEAGETDSETVNAVFRAVHSIKGGAGAFGLEDLVHF

Can anyone tell me how to get my sequences now? Thanks, Becky

Comment on Getting data from NCBI Download Code

Replies are listed 'Best First'.
Re: Getting data from NCBI by derby (Abbot) on Apr 30, 2009 at 12:27 UTC
Becky, NCBI switched over to a web service model years ago. Check out their eutils page for more info. If you do a lot of work with their data, I would recommend signing up on their mailing list. For this particular query I think you would need to make an esearch request and then an efetch (but the folks at NCBI would now better): `http://eutils.ncbi.nlm.nih.gov/entrez/eutils/search.fcgi?db=protein&te +rm=148261691&rettype=uilist&usehistory=y` [download] and then using the info from esearch (basically WebEnv and query_key): `http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&We +bEnv=<xxx>&query_key=<yyy>&rettype=fasta&retmode=xml&sort=pub+date` [download] -derby Update: The NCBI folks may have a better way of directly pulling the data based on ids -- I only query pubmed and the app normally has to do a search first so this search/fetch approach always worked well for me. Update: Well ... it looks like you can directly pull: `http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id +=148261691&rettype=fasta` [download] Like I said, I really like what the NCBI folks are doing.	[reply] [d/l] [select]
Re^2: Getting data from NCBI by jpearl (Scribe) on Apr 30, 2009 at 17:34 UTC
I actually had a java program that did something very similar to this. It was very nice to just have the sequence as a string to read in, no parsing required! You can get the bare fasta formated sequence pretty easily if you use something like this: `my $giNum = 148261691 my $seq = "http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protei +n&qty=1&c_start=1&list_uids=" .$giNum ."&uids=&dopt=fasta&dispmax=5&sendto=t&from=begin&to=end";` [download] and then do something similar to what you're doing above. good luck!	[reply] [d/l]
Re^2: Getting data from NCBI by Becky (Beadle) on Apr 30, 2009 at 12:49 UTC
You're a star, thanks! I never knew about that stuff before but will use it from now on! Becky	[reply]
Re: Getting data from NCBI by frieduck (Hermit) on Apr 30, 2009 at 14:29 UTC
Does this URL work? http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?tool=portal&db=protein&val=148261691&dopt=fasta&sendto=on&log$=seqview&extrafeat=0&maxplex=1 I found it from the "Download" menu on the right side of that page.	[reply]