Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Can I use 'get' to grab the HTML source of an external website? For instance, I was using:

$html = get $url;
where $url is defined in the URL of the script (i.e. script.pl?url=http://www.blah.com), and that was working to grab the HTML on one server, but when I moved the program over to a different server that doesn't seem to work anymore.

Is there another function I can use? Or node just doing something wrong?

Intaglio

Replies are listed 'Best First'.
Re: Getting External HTML
by PsychoSpunk (Hermit) on Dec 20, 2000 at 23:38 UTC
    use LWP::Simple;

    That's a starting point. Since you're only needing to get the page from the site, LWP::Simple is probably the way to go.

    use LWP;

    for anything more complex. Look at Tutorial on LWP for some links to tutorials.

    ALL HAIL BRAK!!!

Re: Getting External HTML
by Hrunting (Pilgrim) on Dec 21, 2000 at 01:02 UTC
    While we're on the topic of HTML and web modules, has anyone had a change to play around with HTTP::GHTTP? It's written by Matt Sergeant (mod_perl nut) to basically do exactly what this anonymous monk is asking:
    use HTTP::GHTTP 'get'; print get $uri;
    It also has a more feature-rich object-oriented interface. It looks incredibly promising (especially since LWP is such a behemoth), but I haven't gotten around to installing libghttp yet, so I haven't seen it in action.

    When I saw that syntax this guy was using, I immediately thought of HTTP::GHTTP, though.

Re: Getting External HTML
by mrmick (Curate) on Dec 21, 2000 at 00:01 UTC
    A useful module for doing this is LWP and it's sub classes (modules).

    Mick
Re: Getting External HTML
by dws (Chancellor) on Dec 21, 2000 at 05:55 UTC
Re: Getting External HTML
by davorg (Chancellor) on Dec 21, 2000 at 02:23 UTC

    Maybe your first server had the LWP modules installed but the second one didn't.

    What error message are you seeing?

    --
    <http://www.dave.org.uk>

    "Perl makes the fun jobs fun
    and the boring jobs bearable" - me

Re: Getting External HTML
by Fastolfe (Vicar) on Dec 21, 2000 at 06:04 UTC
    It sounds like you're using LWP::Simple inside a CGI script. We need to do some simple information gathering before we will be able to help you at all: Are you getting an error message? If so, what is the error message? You may need to examine the server's error logs (or use CGI::Carp 'fatalsToBrowser'). Is your script even compiling? Perhaps you do not have this module on the new machine?

    It could be any number of things with the vague "it isn't working" description and nothing in the way of error messages or behavior descriptions.

Re: Getting External HTML
by tune (Curate) on Dec 21, 2000 at 04:46 UTC
    If nothing is working, finally you can use wget which used to be installed on Linuxes.
    $html = `wget -O - $url`;
    The -O - option tells wget to output the document to STDOUT. If you are on a Linux box this is the easiest way, though not a perlish solution :-)

    -- tune

Re: Getting External HTML
by electronicMacks (Beadle) on Dec 21, 2000 at 05:57 UTC
    When you say: $url is defined in the URL of the script (i.e. script.pl?url=http://www.blah.com) do you mean that the value of the variable $url is:
    $url="script.pl?url=http://www.blah.com";

    Because in that case the URL you are giving may be relative to the server you are working from, which explains why it would work on one server and not the another.
    If so, it is simply a matter of giving the full url, starting with http://

    If that doesn't fix it, try the more robust LWP::UserAgent. This awesome module is described very well in the awesome book, Web Client Programming With Perl. Which is out of print, but available online at http://www.oreilly.com/openbook/webclient/
    Major props to Oreilly for open sourcing their out of print books!