Retrieving URLs

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Retrieving URLs
by dkubb (Deacon) on Jan 19, 2001 at 08:45 UTC

#!/usr/bin/perl -w

use strict;
use LWP::Simple;

my $content = get('http://www.perl.com');

#do something with the $content

print $content;
[download]

From your post, I see that you are retrieving information from web pages. You didn't say what sort of information, but here are some of the more popular modules that people use to parse web page elements:

Short Description	*HTML:: CPAN Module**
Extract Table Information	HTML::TableExtract
Fetch all URL links on the page	HTML::LinkExtor
Parse and create form attributes	HTML::Form
Generate summary of content	HTML::Summary
Everything else	HTML::Parser

[reply]
[d/l]

Re: Retrieving URLs
by extremely (Priest) on Jan 19, 2001 at 08:19 UTC

Did I have to roll my own?

--
$you = new YOU;
honk() if $you->love(perl)

[reply]

Re: Retrieving URLs
by cat2014 (Monk) on Jan 19, 2001 at 08:01 UTC

on CPAN

It's a good way to easily get just the urls from a page. Of course, depending on what you want to do with the urls that you get, you might be better off with LWP. good luck! -- cat

[reply]

Re: Retrieving URLs
by zeno (Friar) on Jan 19, 2001 at 14:30 UTC

Craft

Use LWP::Simple to download images from a website

getstore

get

perl -e "use LWP::Simple;$s=get'http://www.yahoo.com');print $s"

[reply]
[d/l]
[select]

Re: Retrieving URLs
by Beatnik (Parson) on Jan 19, 2001 at 14:25 UTC

lynx --dump

wget

LWP::Simple

LWP

[reply]
[d/l]
[select]

Re: Retrieving URLs
by ColonelPanic (Friar) on Jan 19, 2001 at 23:03 UTC

IO::Socket

[reply]
[d/l]

Re: Retrieving URLs
by Anonymous Monk on Jan 19, 2001 at 17:04 UTC

My problem is that I'm developing a perl script that I don't want the user to have to download any modules. I'd like it to be "self-suffient" for the most part. Is there anything that I can do to achieve this? can someone post an example?

[reply]

Re: Re: Retrieving URLs

by arturo (Vicar) on Jan 19, 2001 at 19:35 UTC

LWP::Simple is SO useful, everyone should have it installed anyway =) Why are so many of us so big on using modules? Well, if you want to solve a problem, why not use a tool that's been tested again and again and again and found to work? Why bother rewriting something that's already been done WELL?

A big issue here is how robust you want your script to be -- you can roll your own version, but it's not going to be as versatile and fault-tolerant as one that uses LWP::Simple.

If all you want is a means of retrieving a web page, then you should be able to rely on your users having something like lynx installed (if they're on *nix-ish systems), or, heck, just tell them to download lynx =). Getting a page via lynx is as simple (as was pointed out above) as lynx --dump <url>.

If you INSIST on doing it in perl, then you're going to have to understand the HTTP protocol; I won't bother to do the search myself, but I seem to recall "getting a web page without LWP" being a thread on here recently. Good luck!

Philosophy can be made out of anything. Or less -- Jerry A. Fodor

[reply]
[d/l]
[select]