Samn has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Testing URL existence
by Aristotle (Chancellor) on Jun 22, 2002 at 20:01 UTC
    To follow up on that explanation, LWP::Simple's head() method is a very easy and economical way of testing a URL since it uses a HEAD request which does not actually send the document body around. To further facilitate the approach, you should issue a redirect to either the cam URL or the "sorry it's down" picture from the CGI depending on the test result, if it's being referred to via IMG SRC=. If the CGI's output is the cam portal's HTML page it should produce IMG SRC= tags with different URLs according to the tests' results. That way the CGI script does as little work as possible, to keep the server's load low.

    Makeshifts last the longest.

Re: Testing URL existence
by dws (Chancellor) on Jun 22, 2002 at 20:14 UTC
    You have another problem waiting in the wings once you solve your immediate problem. That is, if you're doing several existence checks (using LWP::Simple or whatever) while the end user's browser is waiting, you risk timing out the browser.

    There are two approaches to this. One is to arrange emit the HTML stream in a dribble, emitting each WebCam URL as you've done the verification. There's still the risk that the verification will time out, causing a cascading failure to the end user's browser.

    The second approach is to periocially scan your WebCam URLs, caching the results. Then feed the browser from the cache. Your eliminate the risk of timeout by adding a (smaller?) risk that a webcam will have become unavailable since the last time you scanned it.

      Or that it is still being assumed unavailable even though it has gone back online in the meantime.

      Makeshifts last the longest.

Re: Testing URL existence
by Zaxo (Archbishop) on Jun 22, 2002 at 20:01 UTC

    LWP has all the tools you need to check the validity of an URL. Commonly, one tries a HEAD request. If that fails try a GET, since some sides deny all HEAD requests.

    It's a good idea to contact the webmaster of each site. It is a courtesy, and you can get detailed information about their camera schedule and http practices.

    Before you dig in to write code, give some thought to your design. Do you really need to check with each page request? HTTP requests can take a while, do you need to fork off these requests to keep your main script awake? Or the client from timing out?

    Some effort spent on realistic study of time and resource needs will really pay off, I think.

    After Compline,
    Zaxo

Re: Testing URL existence
by grep (Monsignor) on Jun 22, 2002 at 20:56 UTC
    If you want more extensive testing of those webpages look into HTTP::WebTest. It will not only verify the existance of a webpage, it will also check for the existance of a returned string or regex.

    It also allows for external files with config information so you can setup as many pages as you want all with different criteria quickly. It will handle authentication, cookies, testing return time, and SSL. There is also a cookbook for the module.

    an example of a config file:

    test_name = Web cam page page url = www.the-site-I-want-check.com text_require = ( <img src="image.jpg"> ) regex_require = /^<a href="http:\/\/www.yahoo.com\/d+"/ text_forbid = ( Premature end of script headers an error occurred while processing this direc +tive) min_bytes = 13000 max_bytes = 99000 min_rtime = 0.010 max_rtime = 30.0 end_test


    grep
    Just me, the boy and these two monks, no questions asked.
Re: Testing URL existence
by Ryszard (Priest) on Jun 22, 2002 at 19:45 UTC
    You could quite easily perform this test with LWP::Simple.
Re: Testing URL existence
by shotgunefx (Parson) on Jun 22, 2002 at 22:06 UTC
    Here's my opinion.
    • I'd use LWP::UserAgent instead of LWP::Simple (More control and info)
    • I would scan periodically, not every time. (Maybe every 5-10 minutes)
    • I would use get instead of head because some servers (as noted above) deny it or respond incorrectly.
    • Check the context-type to make sure it's an image not html or some other type of response.
    • Optionally I'd add an MD5 digest of the images so I could tell if they were live or if they had a static "cam down" image or similar.


    -Lee

    "To be civilized is to deny one's nature."
Re: • Testing URL existence
by silent11 (Vicar) on Jun 22, 2002 at 22:08 UTC

    Samn,
    Most peeps these days have JavaScript.
    Just throw in an onError thingie into the img tag as shown below.
    <image src="http://www.domain.com/webcam.jpg" width="320" height="240" + name="jer" border="0" onError="javascript:document.images.jer.src='/webcam/nowebc +am.gif';">
    This will show your friend's web cam pic, or show your default pic if it doesn't return anything to your browser.
    The code works fine on my site here

    -Silent11
      Most "peeps", actually do not have javascript, and in fact, a majority of browsers do not support it. With all of the issues lately with Javascript-level insecurities, spyware, cookie exploits, etc. many people (wisely) keep Javascript disabled. At least we now have browsers such as Mozilla that can apply a more granular control to Javascript

      Remember, Javascript is executed, not parsed. In any case, it won't work with text browsers (lynx, links), won't work with fast GUI browsers such as Dillo, and won't work on WAP/PDA/cell users. It's also a client-side solution, which means you rely on the client having it enabled, and using a browser which can properly execute the code you put in it.

      Rule 1: Never trust the client browser.

        While I avoid javascript like the plague, I do think "89% enabled" would be considered "most" by the majority.

        In this application, it's images the poster is worried about so lynx && links are moot points, so are most WAP devices.

        While I don't know what kind of cams the monk is linking to, Traffic, Personal or Voyeur :P cams but if it's just a cam links page on his personal website, the 30 second javascript might be appropriate.

        Don't get me wrong. I build large sites for a living and the only thing I will use javascript for is sized popup windows (Click here to see large image type of things) and even then it degrades nicely for people using AOHELL or who don't have javascript or have it disabled.
        My two cents.

        -Lee

        "To be civilized is to deny one's nature."