Re: Grabbing a web page without LWP or the like
by Fastolfe (Vicar) on Nov 21, 2000 at 23:49 UTC
|
use IO::Socket; # distributed with Perl
my $web = new IO::Socket::INET("www.example.com:80")
or die "Couldn't connect: $@";
print $web "GET /some/file HTTP/1.0\n";
print $web "Host: www.example.com\n\n";
$/='';
my $results = <$web>;
Now just parse out the headers, look for errors, etc. This will not follow redirects (e.g. "/some/directory" -> "/some/directory/"), and is generally only usable for the most basic case of web requests. If you want any real abilities outside of this, you'd be far better off using LWP, or at least reading through it and pulling out the code that you need. | [reply] [d/l] |
|
|
Oh, that's beautiful... thank you. This is a good indication that I really need to get to know IO::Socket better.
Alan "Hot Pastrami" Bellows
-Sitting calmly with scissors-
| [reply] |
Re (tilly) 1: Grabbing a web page without LWP or the like
by tilly (Archbishop) on Nov 22, 2000 at 00:06 UTC
|
Win32?
Try Win32::Internet out then. That will allow you to get http, ftp, https, etc while using the correct proxy servers. If you do it right you can even have it cascade through possible modules, so it is portable to Unix as well. | [reply] |
Re: Grabbing a web page without LWP or the like
by japhy (Canon) on Nov 22, 2000 at 00:08 UTC
|
I wrote a module that handles this kinda well. I'm thinking
of adding redirection support and all, but I'm dangerously
close to reinventing the wheel.
use LWP::FileHandle;
lwpopen HOMEPAGE, GET => "http://www.pobox.com/~japhy/"
or die "can't access the url: $!"
while (<HOMEPAGE>) {
print if m!<ul>! .. m!</ul>!;
}
lwpclose HOMEPAGE;
Get the module at
http://www.pobox.com/~japhy/modules/LWP-FileHandle-0.01.tar.gz.
Sorry, no documentation in it (yet), but it's self-explanatory,
and comes with a test program.
japhy --
Perl and Regex Hacker | [reply] [d/l] |
|
|
pssst....bud! The question said something
about "without LWP or the like"
Update:
FWIW, I did read the code before posting this. I
interpreted use URI::Escape as "the like."
I apologize if this sounded snippy or mean -- it was
meant in humor.
| [reply] |
|
|
Psst -- the module doesn't use the LWP suite. If you checked
the source to the module, you'd see it just uses the standard
IO::Socket module. I put it in the LWP namespace
because it's similar in function.
Tsk, tsk. So quick to prejudge...
japhy --
Perl and Regex Hacker
| [reply] |
|
|
Re: Grabbing a web page without LWP or the like
by dws (Chancellor) on Nov 22, 2000 at 01:53 UTC
|
Here's one I hacked up a while back when faced with a similar (distribution-only) constraint. I had the particular problem of often needing to see only the response header. Hence the -h option.
It isn't foolproof, but it gets the job done.
#!c:/perl/bin/perl.exe
# get.pl -- Make an HTTP GET request and report the results
#
# Dave Smith, 6/15/00
use strict;
use IO::Socket;
my $get_or_head = "GET";
my $headeronly = 0;
if ( $ARGV[0] eq "-h" ) {
$headeronly = 1;
$get_or_head = "HEAD";
shift;
}
my $url = shift or usage();
my ($host,$uri) = $url =~ m#^(?:http://|//|)([^/]*)/?(.*)$#;
# print "host=$host uri=$uri\n";
usage() if not $host;
my $sock = IO::Socket::INET->new(PeerAddr => $host,
PeerPort => 'http(80)',
Proto => 'tcp');
die "Couldn't open socket to $host" if not $sock;
print $sock "$get_or_head /$uri HTTP/1.0\r\n",
"Accept: text/plain, text/html, text/xml, image/gif\r\n",
# "If-modified-since: Sat, 14 Jul 2000 01:51:07 GMT\r\n",
"Host: foo.com\r\n",
"\r\n";
while ( <$sock> ) {
s/\r//;
last if $headeronly and /^$/;
print;
}
sub usage {
print <<"END";
usage:
$0 [-h] fully-qualified-URL
-h response header only
END
exit(0);
}
| [reply] |
|
|
I appreciate all the help, this is useful stuff. However, I have a non-Perl question as relates to this thread... how has the Reputation on the originating thread wandered into the negative (-3 as of now)? I see no unpleasantness in it... have I adopted some irritating ways and then become blind to them? In what way is it deserving of disrepute? Let me know, so that I may not make the mistake again.
Alan "Hot Pastrami" Bellows
| [reply] |
|
|
Some people are probably tired of hearing "I need to drill a hole but I can't be bothered to install a (free) high-quality commercial drill but rather must install something I build myself which won't be as good at drilling."
I do understand several of the problems with installing modules that lead to the very often repeated requests for how to do things that great modules exist for but without using these great modules. But it doesn't mean that the requests don't get tiring.
The source code for the modules is freely available so if there is some magic about installing the code that you write, then you can use the module source code in order to rewrite the module yourself. But most of us suggest that you figure out how to install some good quality modules along with whatever code you end up writing and installing.
-
tye
(but my friends call me "Tye")
| [reply] |
|
|
|
|
|
|
|
Ah... a helpful monk pointed out to me that the "--" votes are probably due to the fact that I am trying to reinvent the wheel with this solution, and such is frowned upon. Well, I have 2 things to say in my defence regarding that:
- It is clear that the existing wheel dos not fit the vehicle, so I am forced to improvise, and
- If no one EVER reinvented the wheel, we'd still be clunking around on some rounded rock with a stick through the middle. Re-exercising an old skill a bit never hurt anybody, and often hones said skill.
Now I'm wondering how many "--" votes this one will fetch. Ah, well.
Alan "Hot Pastrami" Bellows
-Sitting calmly with scissors-
| [reply] |
|
|
Re: Grabbing a web page without LWP or the like
by arturo (Vicar) on Nov 21, 2000 at 23:54 UTC
|
My non-perl answer of the day: lynx runs on Win32
Philosophy can be made out of anything. Or less -- Jerry A. Fodor
| [reply] |
|
|
Really? Interesting. However, Lynx still cannot be utilized because it would defeat the .pl file's self-contained requirement. But, thank you all the same.
Alan "Hot Pastrami" Bellows
-Sitting calmly with scissors-
| [reply] |