zoya has asked for the wisdom of the Perl Monks concerning the following question:
Hi all, I have to fill forms on multiple webpages fetch the data parse the html into text and store it in a single file. I have the following code and every webform has different fields to be filled ,this is teh one of the website i have three more can anybody plz tell how can i do this. Suggestions are appreciated thanks.
use strict; use warnings; my $timeout=40; use WWW::Mechanize; use HTML::TreeBuilder; use HTML::FormatText; use HTML::Parser; use autodie qw/ open close /; use 5.012; use Win32::IE::Mechanize; use Time::HiRes 'sleep'; my $m = WWW::Mechanize->new(autocheck => 1); my $browser = Win32::IE::Mechanize->new(visible => 1); my $snp = "rs111"; my $content= $browser->get("http://snp-nexus.org/index.html"); my $html = $browser->content; $browser->form_name ('snpnexus'); #$browser->field('query', 'dbsnp'); $browser->field('batch_text', 'dbsnp rs111'); $browser->tick('ensembl', "ensembl"); $browser->tick('refseq','refseq'); $browser->tick('ucsc','ucsc'); $browser->tick("sift",'sift'); $browser->tick("polyphen",'polyphen'); $browser->tick("chb",'chb'); $browser->tick("chd",'chd'); $browser->tick("tfbs",'tfbs'); $browser->tick("consv",'consv'); $browser->tick("gwas",'gwas'); $browser->tick("indel",'indel'); $browser->tick("mirbase" ,'mirbase'); $browser->tick('gad','gad'); $browser->tick('cnp' , 'cnp' ); $browser->click_button('value', 'RUN'); for (0 .. $timeout*20) { last if $browser->{agent}->ReadyState >=5; sleep 0.1; } my $html2 = $browser->content; my $Format =HTML::FormatText->new(); my $TreeBuilder =HTML::TreeBuilder->new(); $TreeBuilder->parse($html2); my $parsed= $Format->format($TreeBuilder); print $parsed;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Perl Mechanize
by runrig (Abbot) on Apr 24, 2013 at 21:12 UTC | |
by zoya (Initiate) on Apr 25, 2013 at 06:50 UTC | |
by runrig (Abbot) on Apr 25, 2013 at 19:21 UTC | |
|
Re: Perl Mechanize
by Anonymous Monk on Apr 25, 2013 at 03:33 UTC | |
by zoya (Initiate) on Apr 25, 2013 at 06:55 UTC |