good morning dear monks!
I'm new to programming and i am trying to learn the basics of the PERL.At the moment i digg into the Perl LWP::UserAgent.
note; This first mentioned code below runs and give me back the content of the parsed site: what i want is to enter a loop in the argument that fetches the url. In other words - i want to
itterate over some hundreds of targets...
#!/usr/bin/perl
use strict; #
use warnings; #
use diagnostics; #
use LWP::UserAgent;
$ua = LWP::UserAgent->new;
$ua->agent("$0/0.1 " . $ua->agent);
# $ua->agent("Mozilla/8.0") # pretend we are very capable browser
$req = HTTP::Request->new(GET => 'http://dms-schule.bildung.hessen.de
+/suchen/suche_schul_db.html?show_school=5503');
$req->header('Accept' => 'text/html');
# send request
$res = $ua->request($req);
# check the outcome
if ($res->is_success) {
print $res->content;
} else {
print "Error: " . $res->status_line . "\n";
}
as mentioned above: the code runs well and nicely: i want to build in a loop to fetch more pages. Well i want to fetch pages
from
http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=01
to
http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=10000
- the one that have no results i want to drop (but that has to be done later with some additional code.
For the proof of concept i want to get all the urls - let us say printed out that the LWP::userAgent fetches...
the quesions are:
1. how to enter the loop in correct way.
2. how to make the prorgamme to print out all the URLs that are fetched. (later on i want to parse the sites with content) but thats a part that i have do design and code later on.
Here the code that has a build in loop - to make USER-Agent to itterate over a bunch of targets.
# first get a list of all schools
my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.7) Ge
+cko/20070914 Firefox/2.0.0.7");
#pretending to be firefox on linux.
for my $i (0..10000) {
my $request = HTTP::Request->new(GET => sprintf("http://dms-schu
+le.bildung.hessen.de/suchen/suche_schul_db.html?show_school=5503,%d",
+ $i));
$request->header('Accept' => 'text/html');
my $response = $ua->request($request);
if ($response->is_success) {
$pagecontent = $response -> content;
}
# now we can do whatever with the $pagecontent
}
my $request = POST $url,
# check the outcome
if ($res->is_success) {
print $res->content; # please print out all the URLS that were fe
+tched! Thx my dear!
} else {
print "Error: " . $res->status_line . "\n";
}
do you have any idea how to insert the loop correctly - and how to get the programme to print out all the urls (not the content)!!!
Please let me know if i have do be more descriptive!
many thanks! Perlbeginner1