tphyahoo has asked for the wisdom of the Perl Monks concerning the following question:

I was wondering if anyone had LWP or Wget anecdoetes they could share, where one was better than the other for a particular crawling/dling task?

Proxies? Https? Cookie management? Post form filling and submitting?

I know from experience that LWP::UA can do all of the above. Can wget as well? If so, why use LWP?

Replies are listed 'Best First'.
Re: LWP versus Wget
by hardburn (Abbot) on Mar 03, 2005 at 15:52 UTC

    Form filling will be tricky with wget. But the rest is possible.

    With wget, you're limited to whatever processing you can do by pipeing output to another program. With LWP, you have all of Perl available to work on the data. In many situations, wget is enough, but LWP is there when it's not.

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

Re: LWP versus Wget
by Anonymous Monk on Mar 03, 2005 at 16:33 UTC
    I use whatever is fit for the job. If I want simplicity, I use LWP::Simple. If I need to fill out forms, I use WWW::Mechanize. If I need to connect to a busy server or through a flaky network and I need retries, I use wget. If I need something, and LWP isn't available, lynx might do the job as well. Or ftp, or ncftp. If it's being done using FTP, and I need to do something more fancy than retrieving a document, I use Net::FTP. If I want to recursively download something, or continue a partially downloaded file, I use wget. If I want to retrieve something, and display it immediately, system "mozilla URL" might do the trick, or I use the remote control functionality of a running browser. If I need to be really fancy, I use LWP::UA. And for debugging, I might use "telnet host 80" from the command line.

    I've used all of the methods I mentioned above. As with most programming techniques, it's a matter of finding the right trade-off between simplicity of the interface, your needs, your knowledge/experience of the tool, offered functionality and availability. It's a fallacy to think one tool is "better" than the other. A carpenter isn't going to say "I've a hammer and a screwdriver - why have both?" either.

Re: LWP versus Wget
by zentara (Cardinal) on Mar 03, 2005 at 18:24 UTC
    Also check out Curl and libCurl There is a Perl module too, called WWW-Curl-2.0

    It is very powerful.


    I'm not really a human, but I play one on earth. flash japh
Re: LWP versus Wget
by jpeg (Chaplain) on Mar 03, 2005 at 19:36 UTC
    Most of the time when I have to snarf a web page, I have to extract some data from it afterwards. I think it's *way* easier to do with the tools in perl than doing a
    cat | sed | awk | sort | sed | diff | sed | sed | awk | sed
    chain. In perl, I can assign the $response to a variable, walk through it, strip the html, the tabular data, verify it against what I've expected, and stuff it into a db - all in one program. AND I can check against any errors occuring in any of those steps.

    I've known people who spend all their time in sed/awk and can whip up scripts to do everything there - and I'm sure people can do it in emacs and make and C. I choose perl. Whatever works for you.

      True, but $response = `wget -O- URL`; is shorter than use LWP::Simple; $response = get "URL";, while still enabling you to use the full power of Perl to parse it.