Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow monks!
I've been trying to retrieve some data from a certain type of webpages, like the followings:
http://www.ncbi.nlm.nih.gov/nuccore/298379586?from=489972&to=491012 http://www.ncbi.nlm.nih.gov/nuccore/307551844?from=1037615&to=1038667 http://www.ncbi.nlm.nih.gov/nuccore/309700213?from=1125254&to=1126294
As you can see there are all of the same kind. What I need to extract is the sequence, namely the last part, from ORIGIN onwards, which has numbers 1, 61, 121, etc in the begining. If I could download the page, I believe that I could create a pattern match script to get the sequence, but with

 wget "http://www.ncbi.nlm.nih.gov/nuccore/309700213?from=1125254&to=1126294"

for example, I DO NOT get the same page that we see on the web. The only way I can get the same page is by doing File->Save Page as, which apparently can't be done for 1000+ pages...
Is there a way of either retrieving the data I need WITHOUT having to download the page OR a way to download the page BUT ensuring that I get EXACTLY the same data inside?
Thanks in advance!

Replies are listed 'Best First'.
Re: Extract data from web page!
by tobyink (Canon) on May 27, 2012 at 20:20 UTC

    Don't re-invent the wheel. Use Bio::DB::GenBank.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      Hm, that's great, thanks! I seem to be having some trouble though, probably I'm not good in handling objects...
      I tried using the snipet they have as example:
      use Bio::DB::GenBank; $gb = Bio::DB::GenBank->new(); $seq = $gb->get_Seq_by_gi('405830'); # GI Number print $seq;

      and I got:
      Bio::Seq::RichSeq=HASH(0xa49c0b4)

      as a response... What must I do to get the sequence, like in here?
      http://www.ncbi.nlm.nih.gov/nuccore/405830?report=fasta

        Something like this?

        use 5.010; use strict; use Bio::DB::GenBank; my $gb = Bio::DB::GenBank->new(); my $seq = $gb->get_Seq_by_gi('405830'); # GI Number say $seq->display_id; say $seq->desc; say $seq->seq;
        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'