website archiver

xorl has asked for the wisdom of the Perl Monks concerning the following question:

I'm sure this is one of those things that has already been coded and is out there somewhere. I'm looking for something that given a URL will spider the site and store a copy of the site locally.

I have a pretty good idea of how to go about writing one. I'm just lazy. I figure if no one can find something out there, I can always take linklint and make it save the files after it checks it (is that a good idea?).

Thanks in advance.

Comment on website archiver

Replies are listed 'Best First'.
Re: website archiver by doom (Deacon) on Jan 13, 2009 at 04:31 UTC
I think you're looking for the "wget" command. Myself I tend to do this, but if you're doing this for archival purposes you might prefer to do it differently (e.g. without "-k" of "-H", and maybe without "-l"): `wget -r -l 8 -w 100 -k -p -np -H <URL>` [download] Briefly what these options do (read the man page): `-r is recursive -l is the max depth -w is the wait, seconds between retrievals -k convert-links for local viewing -p gets all "page-requisites", e.g. images, stylesheets -np "no parent", means to avoid following links to levels above the starting point. -H Enable spanning across hosts when doing recursive retrieving.` [download]	[reply] [d/l] [select]
Re: website archiver by Arunbear (Prior) on Jan 13, 2009 at 07:26 UTC
Try HTTrack	[reply]
Re: website archiver by Anonymous Monk on Jan 13, 2009 at 07:24 UTC
lwp-rget or httrack	[reply]