wink has asked for the wisdom of the Perl Monks concerning the following question:

I just moved to a new web server (powweb, if anyone knows anything bad, let me know so I can pull out before the 30 day trial is up!) and I have a few questions about setting up my new page.

1) I want to move the content from the old server, but I don't have access to it and the person that does may not be available for a few days or more. I know there are modules like WWW::Robot that will crawl all links from a page recursively, but I've never used anything like this before and reading some of the documantation made my head spin! I'm not sure if this will do what I want it to, either. I basically want to prevent any broken links, so I want a list of all the URLs (including images) that the page(s) link (if the module provided a way for me to downloadthem too, that would be a bonus, but not a requirement). I thought maybe a hook in WWW:Robot to output the URL of any successfully visited page and also to read through it and pull out the image locations, but is there a better solution?

2) The web server unfortunately doesn't have in-place editing (1 of the 2 negatives I found, the other is no mailing lists!). There's nothing like a file manager that allows you to view your directory structure and edit files in a browser window. I've already thought about some ideas for coding this myself, but, as I've seen others say before, I don't want to re-invent the wheel. Any modules or scripts people know of off the tops of your heads? I say off the tops of your heads because I don't want to do my research for me, but I did a SuperSearch and a search on CPAN and didn't find anything I liked. I may have just not been searching for the right thing.

Thanks!

--Wink

  • Comment on Moving hosts: questions about web content

Replies are listed 'Best First'.
Re: Moving hosts: questions about web content
by rruiz (Monk) on Aug 08, 2005 at 07:39 UTC

    Hi, you need to give us more details on what you want and what have you tryied so far. Because this question seems a little OT.

    In any case, here are some pointers on what I would do in your situation.

    • On the hosting question, try Web Hosting Talk, it's a forum with good reputation as to hostings recomendations and/or previous troubles with some hosting providers
    • To get the content from your previous host, and if you only have web access, you can use wget to retrive your content (not a Perl solution, but it is a good utility). If you have ftp access, I think you may better look for an ftp mirror module/utility (I use lftp, which have a nice mirror command, but YMMV).
    • In any case, you need to ask you hosting provider to give you a tar ball of the full contents of your site. Because if you use wget, WWW::Robot or anyting else to retrieve your site with only web access, you will not get the full content of your site (not to mention that if any part of your site is dynamic (ie. CGI Perl or PHP based), you will loose your scripts and the actual data they use to present the content).
    • You need to clarify what you mean by in-place editing. Is it Wiki, CMS? Or DAV, FP or DW extensions? Or something else? Because it is impossible to guess. ;)
    • Your new host should provide a way to upload files, it can be ftp based or via cPanel, but if not, you need to begin looking for another provider right now. ;)

    And just as a final note (because I am going way OT in this answer), you need to double-check with your provider about this:

    The web server unfortunately doesn't have in-place editing...

    Because if it means that you don't have CGI or PHP scripting or something, you will be very limited and may be unable to provide dynamic content on your site.

    HTH, God bless you
    rruiz

      The site has amazing Perl, CGI, SSI, etc. support. As far as "in-place editing", I just wanted a way to do a file listing other than FTP, like a cPanel file manager or such. When I asked on the member forums, I got a no, but it looks now like they have WebFTP and something called sitemanager that will do what I'm looking for. I'm still skeptical.

      I'll look into wget as you and others have suggested. Thanks!

Re: Moving hosts: questions about web content
by davidrw (Prior) on Aug 08, 2005 at 12:49 UTC
    wget --mirror http://yoursite.com/foo.html will recursively suck down that page and everything under it, including css and image files. See the wget manpage for lots of options.

    lynx --dump http://yoursite.com/foo.html may also be useful to you -- by default it provides a list of all the files that are linked to in the rendered page. You could then manually review this or loop over it with LWP::Simple or wget or similar.
Re: Moving hosts: questions about web content
by dtr (Scribe) on Aug 08, 2005 at 08:54 UTC

    Not a perl answer, but the "--mirror" argument to the "wget" command might be what you want - IF your site only consists of static pages.