in reply to Re^2: Reinventing the wheel
in thread Reinventing the wheel

I appologize, but your original post stated that you were looking for a link spider, and you did not state that for the problem you were trying to solve such behaviour would be overkill. You instead presented an alternative solution, which did not seem to fit the role of spider, as I understood it, because it did not parse the page and follow subsequent links. This may support the Reading the same text and getting a different impression thread.

Personally, I have done webserver support for many years (since 1995 ... my first server migration was so we would have support for software virtual servers and SSL), and keep a number of tools on hand for testing. For quick tests, I typically either just bring them up in a web browser:

$ netscape "url_goes_here"

or for times when the hosts aren't in DNS yet:

$ telnet server_ip_address GET /url_path HTTP/1.0 Host: url_hostname

As another alternative, I have the respective content owners check their websites, while I keep an eye on the webserver's error log:

$ tail -f /path/to/webserver/error_log

You could also run wget against the site to begin the spidering, while you watch the logs, if you don't have linklint.

I have done a number of large scale server migrations, and the only time that we have ever had a problem with the data migration that wasn't caught in our migration testing, it was because we did not sample a large enough number of the pages. (one of the shell scripts to get all of the files that had been modified since the last tape backup generated too long of a list in one of the user's directories, which resulted in too long of a list sent to tar, which failed silently). And of course, the files that were missed were from a message board for a distance learning program ... so I spent the next two days consolidating the posts between the two servers.

Your original message also stated that the time savings were over finding a suitable program, and did not mention that you had a slow link, or were concerned with the time to get up to speed with the input parameters (although, a basic test with linklint is very simple). I admit that using Google for link spider pulls up crap, but this is one of those times where Yahoo does well. (okay, not on the original search, but it recommends 'linkspider', which has useful info.) Also, the search terms 'link checker' and 'link validator' both return useful results from Google.

I apologize if you took offense to my first reply, but I intended it to be constructive, and point you towards other tools that might be useful should you perform a similar migration again, and that your code sample didn't act as a link spider, which I interpreted from your message that you had intended it to serve in that role.

Update:: I suck at spelling.

Replies are listed 'Best First'.
Re^4: Reinventing the wheel
by bageler (Hermit) on Mar 21, 2005 at 15:30 UTC
    my apologies. I could've been more clear about the details of the situation and further explained the narrow scope of what was needed to prevent any misunderstanding. During day to day business, we do indeed have a more complete toolset available for regular monitoring. The guy just wanted a way to check things before turning the monitoring back on so that in case something was wrong, noone would be paged since they were all already at the colo facility.

    Also you make some big assumptions about the way this move was done, i.e. that there was testing and transparent planning before the actual move. I'm just a developer responsible for the software running on these servers and not privy to such details.

      Well, I've found that no matter how much planning you do, there's always something left to go wrong. I also don't know what you're using for your monitoring, but some of them have a way to keep them from sending out alerts, so that when you know something's down, it won't keep alerting. (A couple of the folks I used to work with were contemplating on making something that would only allow you to set a timer to supress the warnings, and would then escalate if the problem wasn't resolved in time.... after we found out someone had disabled the warnings on their systems, because they didn't like getting paged 'all the time'.)

      Depending on how the paging goes out, if it's through a mail gateway, you might also be able to shut down the local mailqueues, if it doesn't send straight to another SMTP server. (fortunately, I have the luxury of being able to sit and think about these things, as opposed to when you've been up all night, and you're hitting the end of your planned outage window, and all you want to do is get out of there, and crash for the night)

      Anyway, I find you learn best from doing things. (and I learn the most from making mistakes -- as I don't want to repeat them). No matter how much you read, take classes, or plan for things, it's nothing like the real thing, when you're pulling 16hr days for 2 weeks straight, trying to recover a 30k user mailstore, watching three fibrechannel controllers that you connect to mysteriously fail one after another. Or when you power down your entire 100+ server data center (so they could install a UPS bypass switch...yet we had to shut down the next year for the batteries to be serviced), only to have your terminal server not come back up, so you're rolling two WSYE terms about on carts as you bring the machines up two at a time.