in reply to Gathering module usage statistics

Having statistics for module installations would be very nice. However, I must agree with most people here that giving users 10 seconds to opt out is not enough. Lots of people go and have a cup of coffee while the installation runs. Like swngnmonk, I immediately thought about Pine. That program lets you opt in and send a message the first time you use it. But you have to say "yes" explicitly.

I think it wouldn't be so bad if your installer asked "Do you want to send statistics? (n)" and wait indefinitely. Some people wouldn't like it because the installation wouldn't be completely automated anymore, but many modules already do that kind of thing (usually for asking if you want to enable certain features) and we survive. LWP asks you if you want to run some tests which require using the network and whether you want to install certain scripts; the Template Toolkit asks you if you want to install certain libraries and examples; etc.

I propose a compromise solution:

Let's all agree on a special environment value, which I'll call PERL_INSTALL_PHONE_HOME in this discussion. Participating modules will agree on the following:

  1. If the variable has the value '1', feel free to phone home the agreed information without asking.
  2. If the value is '0', don't phone home and don't ask.
  3. If the variable is undefined (or has any other value?), ask the user during the installation procedure whether he wants to phone home or not, and explain how setting this variable will avoid such questions in the future.

CPAN.pm has configuration settings for passing options (such as PREFIX) during compilation and installation, so the user could either set the preference there or define an actual environment variable.

Of course, this requires cooperation from several parties, but I'm afraid any mutually acceptable solution will. The Perl community is used to developing conventions, idioms, and agreements (such as the standard install and import procedures) so it might be possible.

This is in spirit like the Robots Exclusion Standard. It works if people follow it. Of course "nasty" modules might not follow it, but any module can do nasty things if it wants. I'm sure the community can police itself to ensure that CPAN modules don't abuse it

  • Comment on Re: Gathering module usage statistics - a compromise solution?

Replies are listed 'Best First'.
Re: Re: Gathering module usage statistics - a compromise solution?
by Juerd (Abbot) on May 05, 2004 at 08:48 UTC

    Some people wouldn't like it because the installation wouldn't be completely automated anymore

    Including myself. I don't mind doing something controversial, and would not even mind making this opt-out with no explicit warning. But I will not make it do something that I disagree with myself ;)

    Let's all agree on a special environment value, which I'll call PERL_INSTALL_PHONE_HOME in this discussion.

    I've called it NOPHONEHOME before. Why limit this to Perl? Just in case gathering statistics will become popular, I want to make sure upfront that my environment won't have to have a PERL_NOPHONEHOME, PYTHON_NOPHONEHOME, etcetera.

    I propose that we use NOPHONEHOME and evaluate it as:

    • If it does not exist, the installation program will have to ask.
    • If it does exist and has a value of "" or "0", the installation program will still have to ask.
    • If it does exist and has any other value, the installation program must skip the phoning home part.
    I'm not comfortable with allowing everything to phone home, as I want to see what information they will send before I agree.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      When I said "feel free to phone home the agreed information" I was thinking (but I neglected saying it explicitly) that as part of this protocol there would be a standard, pre-agreed set of information that may be sent (Perl and module version and platform are good candidates). I think it would be even better if a relatively neutral and central place like search.cpan.org (or something like that) collected all the statistics, instead of contacting the author directly. This way some people might be more likely to trust sending the information without being asked, and people would also be able to compare statistics for different modules and authors. If you are not comfortable with allowing everything to phone home (or rather the CPAN site) automatically, you are still free to leave the variable in the "ask" state.

      I would also suggest developing a standard CPAN module for doing the phoning instead of each author rolling their own. That would make it easier to use, have more flexibility for configuring it to use proxies or whatever is needed, and again it might make some people more likely to trust it.

      I've mentioned trust several times. That is the central problem here, but just remember that whenever you install any program without reading and understanding the entire source code first, you are implicitly trusting the author (hey, if you didn't trust the author you would have written the code yourself!). Maybe there are some other modules that phone home without asking and no one has noticed. It would be good to have a standard way for authors that want to play nice to do this in an acceptable way.

      Some people wouldn't like it because the installation wouldn't be completely automated anymore
      Including myself.

      [Raises hand] Indeed. If your module phones home with OS and Perl version, I have no objection. If it requires user intervention during install, I won't use it. I don't need that hassle. The only exception I make to this is for stuff included in Bundle::CPAN, because that's so vital and because it installs first and then I can walk away while everything else gets installed.

      As it happens I don't use your module, but if I did, I wouldn't get counted with the opt-in plan. Just so you know. I almost never sit and babysit CPAN while it's installing modules. Especially not on desktop systems. Sometimes I position the window where I can see the bottom of it peeking out behind some other window I'm working in, but just as often I start CPAN going in a screen session and detach, especially when I'm sshing into a headless or remote system. Is there a security risk? Sure. There'd be a security risk even if I monitored the install closely. I could mitigate that risk by reading the source of every module before I install it, but that would consume a lot of my time; there's a tradeoff involved here, and to date the worst fashion in which I've been "bitten" by a module from the CPAN was when I discovered that a certain module I rely on heavily won't run out of the box under Taint checking. (The module in question is a dependency for DateTime::Format and as such is worth enough to me that I investigated this problem and figured out how to resolve it, rather than dropping the module. Most modules I'd just discontinue using in this situation.)

      As for privacy concerns, there are different kinds of privacy. Information privacy (protecting things like my OS, Perl version, address, ...) is of little concern to me, except for _sensitive_ information (passwords, social security number, and such). I'm not a tinfoil-hat kind of guy, really. But the other kind of privacy, the freedom to not be continually needlessly bothered, is more important to me. I filter spam to the largest extent possible (as long as I don't get false positives; I did away with naive Bayesian filtering because it got false positives, which makes it worse than useless to me; as far as keeping my address private, that would prevent real people from being able to contact me for legitimate reasons, an unacceptable tradeoff), use a browser that suppresses popup windows, don't have a voice phone in my living quarters, and won't use Perl modules that require a lot of babysitting during installation (except, as noted, for really vital things like Bundle::CPAN).

      I'm not comfortable with allowing everything to phone home, as I want to see what information they will send before I agree.

      If a global setting such as you propose were ever implemented, it should have an option for "Don't ask, just do it." That's how I'd set it, but if that option weren't available I'd set it to "Don't ask" and screw your statistics, because I don't want to be hassled. I specifically want to be able to install Bundle::CPAN, reload CPAN, and then set Bundle::Jonadab going and go do something else while another sixty or eighty modules are installed. (Most of those are dependencies, either directly, or through the test suites.) It wouldn't be right to make me sit and watch the scrolling text so I can monkeybang the keyboard every little bit for two hours every time I install a new system or a new Perl version or whatever. That would be far more invasive than sending my OS and Perl version to the module author. It's bad enough I have to configure CPAN and LWP each time; once I get that done, I want the rest to be interruption-free.

      As far as not being comfortable letting "just anything" phone home, a malicious program wouldn't respect my phone-home preference anyway, so if there's code that I don't trust, I shouldn't be installing it at all; if I'm installing your module, that implies that I decided the utility it provides is worth more to me than the security tradeoff of trusting it. So that implies I trust it to be sensible about what data it sends and where it sends it. At some point I'll probably make the mistake of trusting a module I shouldn't, but when that happens I'll deal with it; I choose not to go through life agonizing over such things when the agonizing is fundamentally not even going to significantly mitigate, much less solve, the problem.

      There are things I wouldn't want a module doing at install time; one example has recently come up. But phoning home with statistical data for the author of the module isn't anywhere near that category, as far as I'm concerned. It's nice that you're going to have your module ask for permission, because we know there are some people who object, but as far as I'm concerned, it's fine. As you point out, the risk these data pose is much smaller than the risk I've already accepted by running your code on my systems without personally inspecting every line of it first.

      I'm not saying everyone should be comfortable setting the variable that way in every situation; I'm just saying that if someone were to go to all the trouble to institute a standardized global way to express a preference on this issue, it should provide for the basic standard three options, "Yes", "No", and "Ask"; if a user such as yourself is "not comfortable" with one of these options, then he won't set it that way, will he? "Ask" should probably be the default. It would be nice if there were a way to say "Yes, and set the pref so I'm not asked again" or "No, and set the pref so I'm not asked again", but that would require the existence of a global preference that all software would be expected to know about.


      ;$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$;[-1]->();print