Extract data from web page!

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow monks!
I've been trying to retrieve some data from a certain type of webpages, like the followings:

http://www.ncbi.nlm.nih.gov/nuccore/298379586?from=489972&to=491012
http://www.ncbi.nlm.nih.gov/nuccore/307551844?from=1037615&to=1038667
http://www.ncbi.nlm.nih.gov/nuccore/309700213?from=1125254&to=1126294
[download]

As you can see there are all of the same kind. What I need to extract is the sequence, namely the last part, from ORIGIN onwards, which has numbers 1, 61, 121, etc in the begining. If I could download the page, I believe that I could create a pattern match script to get the sequence, but with

wget "http://www.ncbi.nlm.nih.gov/nuccore/309700213?from=1125254&to=1126294"

for example, I DO NOT get the same page that we see on the web. The only way I can get the same page is by doing File->Save Page as, which apparently can't be done for 1000+ pages...
Is there a way of either retrieving the data I need WITHOUT having to download the page OR a way to download the page BUT ensuring that I get EXACTLY the same data inside?
Thanks in advance!

Comment on Extract data from web page! Select or Download Code

Replies are listed 'Best First'.
Re: Extract data from web page! by tobyink (Canon) on May 27, 2012 at 20:20 UTC
Don't re-invent the wheel. Use Bio::DB::GenBank. `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply]
Re^2: Extract data from web page! by Anonymous Monk on May 27, 2012 at 20:29 UTC
Hm, that's great, thanks! I seem to be having some trouble though, probably I'm not good in handling objects... I tried using the snipet they have as example: `use Bio::DB::GenBank; $gb = Bio::DB::GenBank->new(); $seq = $gb->get_Seq_by_gi('405830'); # GI Number print $seq;` [download] and I got: `Bio::Seq::RichSeq=HASH(0xa49c0b4)` [download] as a response... What must I do to get the sequence, like in here? `http://www.ncbi.nlm.nih.gov/nuccore/405830?report=fasta` [download]	[reply] [d/l] [select]
Re^3: Extract data from web page! by tobyink (Canon) on May 27, 2012 at 20:51 UTC
Something like this? `use 5.010; use strict; use Bio::DB::GenBank; my $gb = Bio::DB::GenBank->new(); my $seq = $gb->get_Seq_by_gi('405830'); # GI Number say $seq->display_id; say $seq->desc; say $seq->seq;` [download] `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l]