whakka has asked for the wisdom of the Perl Monks concerning the following question:
Hi all,
I'm using Win32::IE::Mechanize to scrape data from a public database that uses https encryption - I'm only guessing this has something to do with my problem.
There's two search pages, the first of which I'm able to pass through as expected. The second asks if I want to just perform it or if I want to narrow the search (I want to do the former). In my case this means simply submitting a form with hidden inputs with values from the first search page.
This is where I get tripped up. Every time I even try to access the form name I want or any of the inputs on any form I get an OLE error message from IE::Mech that says "Access is denied." I've tried a number of permutations of form submission to get around this but nothing seems to work.
I am able to call the forms() method, which shows me 2 forms (there are 3 in the source). I am however unable to access any of the information about these forms as I get the same "Access is denied" error. I've determined the first form returned by forms() is indeed the first form in the source, but the next one I'm not sure of as I can't access, again, any methods to test which one it is (the one saying just "submit" [form 2] or the one asking to narrow the search [form 3]).
I'd much appreciate any assistance or even just an insight into what is going on.
Here's the code with erroneous attempts commented out:#!perl! -w use strict; use Data::Dumper; use Win32::IE::Mechanize; use HTML::TreeBuilder; #I/O open (LOG, '>', 'NJ\NJ_get_log.txt') || die "Unable to write to log fi +le\n"; select LOG; $| = 1; select STDOUT; #Make perl unbuffer log st +ream sub new_bot { my $newbot = Win32::IE::Mechanize->new( { visible => 1 } ); return $newbot; } sub pass_disclaimer { my $bot = shift; print LOG "On disclaimer page, accepting.\n"; print "On disclaimer page, accepting.\n"; $bot->form_number(2); my $submit = ($bot->current_form->inputs())[0]; $submit->click(); sleep 2; $bot->get( 'https://www6.state.nj.us/LPS_spoff/geographicsearch.js +p' ); sleep 5; until ( $bot->success ) { print "Waiting for page to load...\n"; sleep 5; } if ( @_ ) { my $county = shift; $bot = search( $county, $bot ); } return $bot; } sub search { ## Begins search on passed county, returning the bot on the first +search page my $county = shift; my $bot = shift; # Load Search print "Loading search page.\n"; eval { # Search Page 1 $bot->get( 'https://www6.state.nj.us/LPS_spoff/geographicsearc +h.jsp' ) unless $bot->uri eq 'https://www6.state.nj.us/LPS_spoff/geog +raphicsearch.jsp'; # $bot->get( 'https://www6.state.nj.us/LPS_spoff/geographicsear +ch.jsp' ); sleep 5; until ( $bot->success ) { print "Waiting for page to load...\n"; sleep 5; } $bot = pass_disclaimer($bot) if $bot->uri eq 'http://www.nj.go +v/njsp/info/reg_sexoffend.html'; $bot->form_name( 'gs' ); $bot->select( 'county' => "$county" ); sleep 1; for ( $bot->current_form->inputs() ) { $_->click() if $_->name() eq 'subC'; } sleep 4; # Search Page 2 # $bot->form_name( 'gsm' ); my @forms = $bot->forms(); print "Search page2 forms: @forms\n"; print "Number of page2 forms = ".@forms." = 3?\n"; # for ( @forms ) { # print "Form name: ",$_->name(),"\n"; # my @inputs = $_->inputs(); # print "Form inputs: @inputs\n"; # print "# of inputs = ".@inputs."\n"; # for my $in (@inputs) { # print "Input name: ",$in->name(), "\n"; # print "Input type: ",$in->type(), "\n"; # } # } # $bot->form_number(1); # print "Selected 1st form\n"; # $bot->form_number(2); # print "Selected 2nd form\n"; # $bot->field( 'municipality' => ' 13 : 30 : ABERDEEN TWP : ABE +RDEEN TWP : MONMOUTH ' ); # $bot->submit(); # $bot->form_name( 'gsm' ); # print "Selected gsm form.\n"; # $bot->submit_form( form_number => 2 ); # print "Submitted form 2?\n"; # print "Form name = ",$bot->current_form->name(),"\n"; # my $submit = ($bot->current_form->inputs())[6]; # print "Submit type: ",$submit->type(),"\n"; # $submit->click(); # $bot->click(); # print "Clicked submit?\n"; }; if ( $@ ) { print LOG "$county: error in search: ",$@,".\n"; print "$county: error in search: ",$@,".\n"; $bot->close; return search( $county, &new_bot() ); } return $bot; } # Main { my $mech = new_bot; $mech = search( '13MONMOUTH', $mech ); }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Win32::IE::Mechanize can't get access to form properties or inputs
by Alien (Monk) on Jul 16, 2008 at 08:27 UTC |