szabgab has asked for the wisdom of the Perl Monks concerning the following question:

While trying to install WWW::Mechanize 1.08 today some of its tests failed:
Failed Test Stat Wstat Total Fail Failed List of Failed ---------------------------------------------------------------------- +--------- t/live/follow.t 2 512 11 2 18.18% 6-7 t/live/follow_link.t 255 65280 7 6 85.71% 5-7 t/live/get.t 13 3328 26 13 50.00% 7 9 11 13 15-17 19 +-22 24-25
Looking at t/live/follow.t shows that it wants to follow a link using

text_regex => qr/Business.Solutions/i

but currently there is no such link on Google.
At least when *I* try to access Google using the same link as in the code (http://www.google.com/intl/en/) with a browser I don't see such a link.

I have not checked the rest of the errors but I guess they are also due to changes in web sites.

This brings me to a question, or two:

Should any of the module installation processes access web sites or anything else outside the machine ?
If yes, is that OK to do that without asking the user first ?
Should it be
  1. a public site such as Google that can change and cause the tests fail but makes it clear the author has no intention to get some information or
  2. should it be a private site of the author that can have a a few pages which are not going to change or
  3. some more static public site such as www.cpan.org ?

Replies are listed 'Best First'.
Re: Accessing the net during module installation
by gaal (Parson) on Dec 29, 2004 at 09:34 UTC
    Screen scraping is fragile. Mechanized web operation per se is not.

    If the module you were installing was meant to do something specifically with Google, it would make sense to test against Google. Then a failure means the module does not work. If it tests a lower-level protocol (e.g. HTTP), it should use a peer to that protocol. But in the general case, Google is *not* a peer to screen scraping, since it never agreed to present web search results in a particular format.

    It would be best if the tests of WWW::Mechanize targeted something with a stable protocol, for example if the module authors set up a test server. Or even a meta-server: people could volunteer test servers of their own, and the main server would just hand out a list of such servers for the client to choose from (through, for example, submitting a form :-)

Re: Accessing the net during module installation
by PodMaster (Abbot) on Dec 29, 2004 at 10:46 UTC
    Should any of the module installation processes access web sites or anything else outside the machine ?
    Sometimes (depends on the purpose of the module).
    If yes, is that OK to do that without asking the user first?
    No! Asking the user doesn't cost you anything, but not asking can cost you (your reputation, ...).

    What is apparently somewhat acceptable is trying to ascertain if the user is connected to the internet (like in http://search.cpan.org/src/GAAS/libwww-perl-5.803/Makefile.PL) before prompting the user (using ExtUtils::MakeMaker's prompt function, the official way to prompt in cpan distributions) by trying to connect to say google.com.

    I say somewhat because a few distributions currently do it and its mostly harmless. I however strongly feel that this testing for an internet connection is best done inside each test file, only if the user first agreed to these tests after being prompted (during Makefile.PL).

    I also strongly feel that the default when prompting for this should always no, because of the way the prompt function works (if its not an interactive session, which can often be the case, the default is taken to be the answer).

    What is absolutely unacceptable is what DOMIZIO used to do (as described here), which basically was eval get "http://his.machine/secret.url" without any prompting whatsoever.

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

      I can't agree more that prompting costs nothing. Setting the intrusive/secret nature of the eval statement aside for a moment, it's annoying as all get out for an installation to assume that a successful eval get "http://www.google.com" means that my non-port 80 connections work as well. I've always appreciated the prompt in the CPAN first-time config that asks you for a wait server while mentioning that you may not be able to contact it. When I started using CPAN, I didn't know a wait server from Adam's housecat. At least, I knew that contacting one might not be successful.

      --

      Amatuers discuss tactics. Professionals discuss logistics. And... my cat's breath smells like cat food.

        Pray enlighten us, WTF is a wait server anyway? Ive always wondered but never quite got around to looking into it.

        ---
        demerphq

Re: Accessing the net during module installation
by elwarren (Priest) on Dec 29, 2004 at 23:22 UTC
    I think 1, 2, and 3 could all fail. Cpan could only be considered more static in that you would expect it to be up if you just installed a module from cpan, but really the ftp and http servers aren't related.

    A better approach might be to go to a site and follow the first link, vs a named link, but I believe following a particular named link is a test.

    Whatever the case, the install should prompt the user before *any* communication outside of the machine. The network policy at my work is very strict. For example, there is absolutely no external email access allowed, including webmail. If an author decided to bounce a test off of mail.yahoo.com, it would be a direct violation of our network use policy, without me even being aware of it.

    JMHO
Re: Accessing the net during module installation
by petdance (Parson) on Dec 29, 2004 at 16:15 UTC
    What are you trying to achieve here? You posted this to perl-qa, and now here. Do you want there to be some sort of public uprising? Wailing and gnashing of teeth?

    xoxo,
    Andy

      I posted this on perl-qa as I belive you are reading that an will respond to it quickly. Actually I was surprised there were no other reports of this - only now did I notice this report

      I posted it here as I'd like to get the opinion of a wider audience, not just of the perl-qa list and I'd like to draw the attention to the general case not the specific of your module. In the original post I even mentioned that I don't think petdance has any bad intentions or anything like this. But then it seemed so obvious that I removed this part. For one thing it was not personal and if it seemed like that then I apologize.

      Anyway, if that takes a popular uprising to change this in your module and in libwww-perl (which is also failing now for similar reasons) then maybe it should happen.

      In any case I am against gnashing of teeth.

        Why didn't you just say "Hey, Andy, I'm not comfortable with having Mech go and hit the net." It's not like I don't read email.

        xoxo,
        Andy

      Damn! I was busy building a WWW::Mechanize army to unleash an uprising to steal all your fridge magnets and small appliances. Now it's back to the drawing board. /me gnashes teeth

      ...and I would have got away with it too, if it weren't for those pesky kids!