Mr_Micawber has asked for the wisdom of the Perl Monks concerning the following question:

I've been using CGI.pm for a long while, and Lincoln Stein's book on it is one of my most tattered manuals. Today I came across this link: W3C's Excessive DTD Traffic. I nearly always put

use CGI qw/:standard/;

at the top of my scripts, hence a call (?) to retrieve the DTD.

Does this ping their servers every time I execute my script?

Replies are listed 'Best First'.
Re: WC3, DTD's and CGI.pm
by Fletch (Bishop) on Feb 09, 2008 at 00:20 UTC

    No your code does not; but a poorly written client retrieving the results might subsequently try to fetch urls referenced in a generated DOCTYPE or xmlns attribute. It's bad clients making the requests based on a misunderstanding of the (correctly written according to the specs) document's contents.

    Update: Just to clarify after rereading your post again: using CGI does not make any client calls in and of itself to retrieve anything from anywhere (w3c or otherwise). The HTML skeleton emitted by methods such as start_html includes http URIs which point to the w3c's servers which incorrectly written clients may attempt to retrieve after receiving the contents of your response.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

HTTP Caching (was: WC3, DTD's and CGI.pm)
by Tanktalus (Canon) on Feb 09, 2008 at 00:41 UTC

    In that blog post you link to, it talks about HTTP libraries implementing caching... which, it appears, LWP doesn't do. However, it got me thinking, and a quick search shows that someone has written something and put it on CPAN to hook into LWP::UserAgent for just this purpose (HTTP::Cache::Transparent).

    To comply with this reasonable suggestion, perhaps LWP should incorporate HTTP::Cache::Transparent directly into its own distribution, and include instructions on initialising the cache (perhaps with some sane defaults available, I'm not sure, e.g., cache_in_home_dir => '.perl/my-app' which would do something sane on windows (stripping the leading dot, and prepending whatever makes sense on Windows) and unix, not sure about other platforms) as part of the LWP synopsises. Well, that's just my two cents anyway.

      Unless I mis-understand, LWP has nothing to do with arbitrary URL fetching of it's own accord. To impose a caching mechanism on it implies permanent storage of something, somewhere to use it. I'm sure I wouldn't be using it for much of anything if I had to provision some permanent file/folder/database just to use it.

      My knee jerk theory is that 99.99% of the traffic problem is caused by web bots and spiders that simply strip any URL they find out of the page and keep on trucking. I'll even go so far as to say W3C did it to themselves by specifying a "http://" URL in the first place. I really don't think they've got any choice now but to live with the consequences.

      On the other hand if they can embed some google adds in those DTDs they're probably sitting on a gold mine!
      Make you suggestion and see what the maintainers think.