You will find almost everything you need in
Bundle::LWP
If you are writing a command-line program, you
could start by reading the source from the GET script
that comes with LWP
| [reply] |
Thanks fglock,
I will definately take a look into that!
VFT
| [reply] |
You can use HTML::Head Parser from HTML::Parser
to extract tags inside head section. If you need some more
specific parser, there are many solutions on CPAN.
Ciao, Valerio
| [reply] |
It sounds like you're "linting" incoming HTML. I've done this before, so you'll need to use LWP::Simple or HTTP::Request to HEAD or GET the raw content, used like:use strict;
my $url = "http://www.foo.bar/blort/quux.html";
my $req = HTTP::Request->new(HEAD=>$url);
my $ua = LWP::UserAgent->new;
my $resp = $ua->request($req);
my $type = $resp->header('Content-Type');
my $status_line = $resp->status_line;
Note I'm using LWP::UserAgent in there, and also I'm testing for the return on the HEAD request, to make sure it's a status of 200. If it's anything but a 200, you have to react accordingly (i.e. 404 is a bad url, 500 is an access error, and so on). Replace HEAD with GET to pull the raw HTML page itself. Ideally you want to test HEAD on the page first, before pulling the content, but that depends on your design, and if you are pulling lots of pages (i.e. a web spider) or one page at a time (upon user request).
You'll also likely want to use URI::Escape to make sure you are handling spaces, @ signs and other "foreign" characters properly as given, so they don't get parsed improperly by your tools or shell. Used like: use strict;
my $url = "http://www.foo.bar/blort/quux.html";
my $safeurl = uri_escape($url);
my $newurl = uri_unescape($safeurl);
print "URL.....: $url\n";
print "Safe URL: $safeurl\n";
print "New URL.: $newurl\n";
The other modules you may want to use are HTML::LinkExtor (used to extract the links), URI::URL (to play with URI objects), and HTTP::Request (to manipulate the request object). I'll leave it up to you to find code examples that represent how to use those modules.
You may want to look at Ovid's CGI Course for some more ideas. | [reply] [d/l] [select] |