lampros21_7 has asked for the wisdom of the Perl Monks concerning the following question:
use WWW::Robot; print "Please input the URL of the site to be searched \n"; my $url_name = <STDIN>; # The user inputs the URL to be searched #Create an instance of the webcrawler my $web_crawler = new WWW::Robot(NAME => 'My WebCrawler', VERSION => '1.000', USERAGENT => LWP::UserAgent->new, EMAIL => 'aca03lh@sheffield.ac.uk', ); #Below the attributes of the web crawler are set $web_crawler->addHook('invoke-on-all-url', \&invoke_test); $web_crawler->addHook('follow-url-test', \&follow_test); $web_crawler->addHook('invoke-on-contents', \&invoke_contents); # to + be able to get contents from webpages $web_crawler->addHook('add-url-test', \&add_url_test); # if url does +n't exist in array then add for visit $web_crawler->addHook('continue-test', \&continue_test); # to exit l +oop when we run out of URL's to visit sub invoke_contents { my ($webcrawler, $hook, $url, $response, $structure) = @_; our $contents = $structure; #To make the string that has the conte +nts in global } # Start the web crawling $web_crawler->run($url_name); print $contents;
*********************************
My idea is that the user first inputs the website to be processed(i use http://www.sportinglife.com/) and then the $structure variable in "sub invoke_contents" will be made a global variable. I have put a print command to see if it will print the contents so that i know if it works but it doesn't seem to work really. I have a dial-up connection(believe it or not) and i leave it for about 15 minutes and it doesn't print anything although i don't think it would take that long anyway. Any idea what am i doing wrong?Thanks
Edit g0n: added code tags
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Getting a website's content
by davidrw (Prior) on Jul 23, 2005 at 00:59 UTC | |
|
Re: Getting a website's content
by marnanel (Beadle) on Jul 23, 2005 at 06:53 UTC | |
by lampros21_7 (Scribe) on Jul 23, 2005 at 13:35 UTC | |
by marnanel (Beadle) on Jul 24, 2005 at 20:11 UTC | |
|
Re: Getting a website's content
by Anonymous Monk on Jul 23, 2005 at 09:05 UTC |