Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
HI Monks. Can I request your help. I'm trying to use mechanize to click every links on a html page. There is a collaped menu on the page. Clicking on a link opens a node which reveals more collapased nodes. There are thousnads of them. My code is below. Its the first time I've used mechanize and html::tokeparse.
I believe they work as follows, Its like a factory. The while loop powers the conveyer belt.
The code is examined a token at a time, get_token() does this. get_tag() can be used when you want to get only a certain type of tag.
It's similar to a person examining products on a factory production line.
After being examined the tokens are either thown away or acted on. The IF clauses within the while loop, find the tokens we want and say how to act.
The orginal html is stored in a string, this is kept and not modified.
It's not possible to jump to a certain place in the string. Just like its not possible to jump to a certain place on a VHS cassete.
How can I output the final page with all the menu nodes expanded?
Anyway this is what I want to do.
#1 search through html until the comment <!-- begin title --->
#2 find the next A href link
#3 click the link it is associated with (this will expand a menu option )
#4 reload the page (wait while this happens
#5 if there are more links left then repeat the steps 1 -4 .
#6 if there are no more nodes closed then print html to a file and exit function
My code is below. the problem is occurring when trying to follow the link using $agent->get($a_href);. I'm getting an error
main::searchHTML() called too early to check prototype at sitemech1.pl line 75.
I've no idea what this means, should I be using a sleep function to wait for the page to reload?
#!/usr/bin/perl -w use strict; use WWW::Mechanize; use HTML::TokeParser; my $login_un = "xxxxxxxx"; my $login_pwd ="yyyyyyyy"; my $agent = WWW::Mechanize->new(); $agent->get("http://somedomain.com"); $agent->form(1); $agent->field("login_un", $login_un); $agent->field("login_pwd", $login_pwd); $agent->click(); searchHTML(); #my $stream = HTML::TokeParser->new("source.html")|| die "Can't open +: $!"; # # <IMG SRC="/T4SiteManager/images/explore-another-item.gif" HEIGHT=" +21" WIDTH="15" VSPACE="0" HSPACE="0" ALT=""><A HREF="SiteManager?ctfn +=hierarchy&fnno=100&nOP=1257&oH=hierarchy&oF=0"> #<IMG SRC="/T4SiteManager/images/explore-node-closed.gif" width="15" + height="21" border="0" vspace="0" hspace="0" alt="Open"> #</A> # sub searchHTML(){ my $stream = HTML::TokeParser->new(\$agent->{content}); while (my $token = $stream ->get_token) { # start searching from <!-- begin title --> if($token->[0] eq "C") # start tag? { my $comment = $token->[1]; #print ("\n\nFound a comment $comment\n\n" ); if ($comment eq "<!-- begin title -->") { print("FOUND $comment"); }; } ### search the A tags my $ttype = shift @{ $token }; if($ttype eq "S") # start tag? { my($tag, $attr, $attrseq, $rawtxt) = @{ $token }; if($tag eq "a") { my $a_href = $attr->{'href'}; if ($a_href =~ m/fnno/) { #this filters the correct links print("link found: $a_href \n\n"); $agent->get($a_href); #searchHTML(); }; } } ### end searching the A tags } print("All finished\n"); } # close searchHTML sub ############# comments #################### #1 search through html until the comment <!-- begin title ---> #2 find the next A href link #3 click the link it is associated with (this will expand a menu opt +ion ) #4 reload the page showing the expanded menu option (wait while this + happens #5 if there are more links left then repeat the steps 1 -4 . #6 if there are no more nodes closed then print html to a file and e +xit function. ############## end comments ##################
Edited by planetscape - removed sensitive information from script
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: using mechanize to expand a collasped menu structure.
by marto (Cardinal) on Jul 04, 2006 at 15:42 UTC | |
Re: using mechanize to expand a collasped menu structure.
by Ieronim (Friar) on Jul 04, 2006 at 16:29 UTC | |
by Cody Pendant (Prior) on Jul 05, 2006 at 03:49 UTC | |
Re: using mechanize to expand a collasped menu structure.
by shonorio (Hermit) on Jul 04, 2006 at 16:07 UTC |