jumbal has asked for the wisdom of the Perl Monks concerning the following question:

What module do you recommend to suck up a website -- I'm creating a server-side webpage compiler -- which should pull the pages and associated media (images, wavs, midis, flash, etc.) and burp it to a local random directory. What solution would you recommend. lwp's rget looks like it might work, but it runs command line -- need it to run within the code. WebMirror and w3mir look like possibilities as well -- just wanted to check and see if anyone else had a quick idea on what they would use that I'm missing? Tks!

Replies are listed 'Best First'.
Re: Need to Compile some Webpages...
by merlyn (Sage) on Nov 27, 2001 at 05:28 UTC
Re: Need to Compile some Webpages...
by IlyaM (Parson) on Nov 27, 2001 at 05:01 UTC
    You can try WWW::Robot. Also if you like lwp's rget you can just call it with system(). Or since it just perl script you can take its source and adapt for your needs.
Re: Need to Compile some Webpages...
by fuzzysteve (Beadle) on Nov 27, 2001 at 05:54 UTC
    take a look at HTML::LinkExtor. Its part of html::parser.
    it has a sample script that grabs a list of the img tags files from a given url. should be fairly simple to expand this to grab everything linked. then you just have to go through the grabbed document and rewrite any absolute links into relative one. should just be a case of using an regex.
    of course, there are premade scripts/programs to do this. you just have to decide if they do what you want, and if you want to reinvent the wheel.(contrary to popular belief, there are reasons to do this. its a good way of learning new modules. if you have time that is)