Re: Retrieving URLs
by dkubb (Deacon) on Jan 19, 2001 at 08:45 UTC
|
Here is a simple script that fetches a web page with LWP::Simple:
#!/usr/bin/perl -w
use strict;
use LWP::Simple;
my $content = get('http://www.perl.com');
#do something with the $content
print $content;
From your post, I see that you are retrieving information from web pages. You didn't say what sort of information, but here are some of the more popular modules
that people use to parse web page elements:
| [reply] [d/l] |
Re: Retrieving URLs
by extremely (Priest) on Jan 19, 2001 at 08:19 UTC
|
| [reply] |
Re: Retrieving URLs
by cat2014 (Monk) on Jan 19, 2001 at 08:01 UTC
|
I haven't used LWP, but I do know that HTML::LinkExtor
works fairly well for extracting links. You can read about it on CPAN.
It's a good way to easily get just the urls from a page.
Of course, depending on what you want to do with the urls that you get, you might be better off with LWP.
good luck!
-- cat
| [reply] |
Re: Retrieving URLs
by zeno (Friar) on Jan 19, 2001 at 14:30 UTC
|
Oddly enough, I just put an entry into Craft called Use LWP::Simple to download images from a website which shows a (very) simple way of downloading images from a webpage. It could be adapted to get html pages as well.
I use LWP::Simple's getstore to do this, but if you used get instead, you could store the contents of the webpage in a scalar for parsing, etc.
For example, you can get the contents of a webpage really easily using LWP::Simple like this, from the command line:
perl -e "use LWP::Simple;$s=get'http://www.yahoo.com');print $s"
With similar code you could then parse through the HTML with regular expressions, etc. Good luck! -timallen | [reply] [d/l] [select] |
Re: Retrieving URLs
by Beatnik (Parson) on Jan 19, 2001 at 14:25 UTC
|
You can also use a lynx --dump trick, not to mention the wget trick. Raw socket connections should also work fine, but LWP::Simple (or LWP in general) is by far the cleanest way to do it.
Greetz
Beatnik
... Quidquid perl dictum sit, altum viditur. | [reply] [d/l] [select] |
Re: Retrieving URLs
by ColonelPanic (Friar) on Jan 19, 2001 at 23:03 UTC
|
I posted a similar question recently. I ended up using an IO::Socket method by Fastolfe that I found somewhere. That is a standard module that everyone has installed, so it should be a good solution. The disadvantage is, you have to remove the header yourself, and also you have to worry more about error handling. However, it worked readily for me.
When's the last time you used duct tape on a duct? --Larry Wall | [reply] [d/l] |
Re: Retrieving URLs
by Anonymous Monk on Jan 19, 2001 at 17:04 UTC
|
My problem is that I'm developing a perl script that I
don't want the user to have to download any modules. I'd
like it to be "self-suffient" for the most part.
Is there anything that I can do to achieve this? can someone
post an example? | [reply] |
|
|
LWP::Simple is SO useful, everyone should have it installed anyway =) Why are so many of us so big on using modules? Well, if you want to solve a problem, why not use a tool that's been tested again and again and again and found to work?
Why bother rewriting something that's already been done WELL?
A big issue here is how robust you want your script to be -- you can roll your own version, but
it's not going to be as versatile and fault-tolerant as one that uses LWP::Simple.
If all you want is a means of retrieving a web page, then you should be able to rely on your
users having something like lynx installed (if they're on *nix-ish systems), or, heck, just tell them to download
lynx =). Getting a page via lynx is as simple (as was pointed out above) as lynx --dump <url>.
If you INSIST on doing it in perl, then you're going to have to understand the HTTP protocol; I won't bother to do the search myself, but I seem to recall "getting a web page without LWP" being a thread on here recently.
Good luck!
Philosophy can be made out of anything. Or less -- Jerry A. Fodor
| [reply] [d/l] [select] |