CPAN is mirrored, and there is no way to know how often your module distribution has been downloaded and no way can easily be made. Very often I have wished to know the number of users that DBIx::Simple has.

Sometimes you can tell your module is used because you get email about it. But that tends to happen only if something is broken and the bug is discovered. I guess that DBIx::Simple is used by more people than PLP, but I get more mail about PLP. Because it is broken (in some very interesting ways; or at least it appears to be broken to those who don't know how things work internally. strangely enough, the brokenness is sometimes intended behaviour).

The recent Makefile.PL security danger invented by DOMIZIO and all the fuss about it, did inspire me. I could just send a simple HTTP request or something like that to my web server on module installation. That doesn't say anything about actual usage, but it would give me information that would be valuable to me.

It could include useful information like $^O and $]. That would probably teach me that not everyone uses a recent Perl on Linux :)

It would explain what it is about to do and then wait ten seconds so it can be terminated if someone doesn't like it. I don't want it to default to not-sending the information, because most installations are done with CPAN or CPANPLUS. Environment variables can probably be used to determine that the module is just being used for automated testing. Update: It would hook into make install to avoid useless statistics from smoke testers.

It would not eval something or use dynamic code, but would instead discard any information received from the web server.

This probably cannot tell me about Windows users that use PPM, but it can give me a lot more information to work with than I have now.

Before I implement this, I would like to know if anyone thinks it's wrong to gather this information. Please let me know what you think about this, in public or privately. [Please be warned though that I reserve the right to ignore your opinion =)]

Update: To be absolutely clear, a summary:

Another update: Opt-out doesn't seem to be very popular. I'm glad I asked first :) Please make sure you read 350615, where I propose these changes:

Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Replies are listed 'Best First'.
Re: Gathering module usage statistics
by simonm (Vicar) on May 04, 2004 at 18:57 UTC

    Environment variables can probabbly be used to determine that the module is just being used for automated testing.

    Please announce if you've found a way to do this; I was under the impression that you can't tell if you're running on a CPAN smoker.

    For what it's worth, I would also like to get feedback on how widely my CPAN modules are used, but would prefer to see a community-wide solution, such as patches to CPAN/CPANPLUS and a shared site to collect the stats... That way we get stats for all modules, not just a few of them, and people can opt in or out of the reporting process in a central place, rather than having to sit there watching the build process and hitting control-C to interrupt the logging requests.

Re: Gathering module usage statistics
by dragonchild (Archbishop) on May 04, 2004 at 19:05 UTC
    I also have been curious in this way about my modules on CPAN. *ponders* I'm thinking that many authors are curious, as would be other users. If I see that modules X and Y might satisfy my needs, but X has 10k installs in the past month and Y has 10 ... I know which I'm going to try out first. Also, it would be useful to also know what versions of Perl and what OS's a module has been installed on. (I know I have this annoying problem where I have access to very few OS's ...)

    Maybe, that website should be usage.cpan.org and it should be a part of MakeMaker?

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

      Maybe, that website should be usage.cpan.org and it should be a part of MakeMaker?

      I like that idea very much! It could be used by MakeMaker, Module::Build, etcetera. Maybe even by PPM.

      However, it will take long before people upgrade their current versions.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        Yeah, well, that's the issue with any rollout to remote locations. :-)

        Now, what we could do in our modules is require the appropriate versions of MakeMaker, etc. That would force the reload the moment our modules are used. Plus, it would happen for all new installs of Perl. So, I s'pose, it would roll out relatively quickly. The numbers would be useful within 3-6 months.

        ------
        We are the carpenters and bricklayers of the Information Age.

        Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

        I shouldn't have to say this, but any code, unless otherwise stated, is untested

      Maybe, that website should be usage.cpan.org and it should be a part of MakeMaker?

      The idea of hooking this into MakeMaker, and/or making it automatic on install by any means, scares me for a variety of reasons that will either seem obvious, or paranoid depending on the readers personality.

      In general, this whole thread reminds be of the Debian PopCon package/database. Perhaps someone could whip up a perl/CPAN equivilent that submits data from perl -V and perllocal to a central repository, which people could choose to install on their system if they want to participate.

        In general, this whole thread reminds be of the Debian PopCon package/database. (...) choose to install on their system if they want to participate.

        That works only if your user base in greater than very huge. There have been MILLIONS of Debian installations, while only less than 5000 were counted by PopCon.

        scares me for a variety of reasons that will either seem obvious, or paranoid depending on the readers personality.

        I'd like to know those reasons. What is wrong with letting others know your OS and version of Perl? No paths, personal information or anything non-static will be sent. $^O and $] are compiled into perl (in fact, $^O is not the platform perl *runs* on, but the one it was *compiled* on) and the module's name plus version are not computed, but hardcoded information. What is your objection to sharing this non-personal, non-identifying information, and why isn't opt-out good enough for you?

        Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Re: Gathering module usage statistics
by mirod (Canon) on May 04, 2004 at 19:35 UTC

    I would not do this. I would also like to know how much my modules are used, but I think this would be the wrong way to get that information.

    First it doesn't tell you the whole story: you would not see PPM, but also RPM, ports, eports... users. And you would get lots of people installing modules that they will never use. Plus what happens if someone downloads the module, then installs it while online? You will end up increasing significantly the complexity of the install process, for no benefit to the user.

    But the basic reason against it is that it would run contrary to what I think Open Source stands for. Of course you always get a nice ego boost when you learn that <insert big corporation, personal hero or favourite arch-ennemy name here> uses your software, but that's not the point. The point is to release code that you are proud of. With no ties attached. "Phoning home" is for closed-source code. And lots of people hate it. I would not be proud to do it. I would feel that I would break the trust that users put in me.

    BTW, did Larry try to track who was downloading Perl 1?

      I don't want to know who is using it. I want to know the following:
      1. How many installs this month?
      2. What Perl versions were used to install it this month?
      3. What OS'es was it installed this month?

      I don't care if it's in New York or Newfoundland. I don't care if it's one guy on 10 computers or 10 guys on one computer. I don't care if they make a million a month or lose a million a month.

      Why do I care? Because I want to allocate the 4-8 hours I have a month to the modules that have the most usage. That's the only reason. It has nothing to do with ego.

      Also, as the user, I want to know how many installs there are for a given module on my OS with my Perl version. Let's say I want to use module X, but no-one has ever installed it on my OS. Now, I know I might have issues. But, if 10k people have installed it on my OS, I feel pretty safe knowing most of the kinks have been worked out. Same goes for Perl version. If the only people who've ever installed this module had Perl 5.8.x installed, but I have 5.005_3 installed, I might expect issues.

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      I shouldn't have to say this, but any code, unless otherwise stated, is untested

        Why do I care? Because I want to allocate the 4-8 hours I have a month to the modules that have the most usage. That's the only reason. It has nothing to do with ego.

        There's a difference between usage and install base. ACME::POE::Knee might be installed on a bunch of boxes because people thought it sounded neat and wanted to try it out once, but that doesn't mean it's getting used very often. Other distributions may be installed on fewer boxes, but get used heavily.

        Don't get me wrong ... "number of installations" is an interesting metric, but it's not the same as "usage"

      In general, I can refer to dragonchild's opinion, because it closely matches mine.

      First it doesn't tell you the whole story: you would not see PPM, but also RPM, ports, eports... users.

      True, but still the figures that it would deliver are more comprehensive than the void we have now. That I wouldn't get a count for packaged modules does not bother me at all.

      And you would get lots of people installing modules that they will never use.

      But by installing it they do show interest. There is no way to get real usage information without contacting "home" each time a method is called. I do not want to do that. I want rough information, more than I have now.

      You will end up increasing significantly the complexity of the install process, for no benefit to the user.

      It would be a 10 second sleep and then some online transaction that has a few-seconds timeout. Other than that, the user that doesn't object to sending information would not notice any difference. I imagine it would be something like:

      For usage statistics, this installation script will send a message to 'cpanusage.juerd.nl:80'. Only your OS (Linux) and your version of Perl (5.008004) will be sent. *** If you do not wish to send this information, abort now, set the NO +PHONEHOME environment variable and run 'make install' again. The following code will be executed if you do not abort: (undef) = LWP::UserAgent->new->head( 'http://cpanusage.juerd.nl:80/', dist => 'DBIx::Simple', distver => DBIx::Simple->VERSION, os => $^O, perlver => $], ); No personal information will be sent. ABORT NOW IF YOU DO NOT WANT THIS INFORMATION TO BE SENT. 10...9...8...7...6...5...4...3...2...1...Thank you for your cooperatio +n!
      Unattended installations happen as usual but take longer and send information, and it gives people who like watching installations something to look at :)

      Of course you always get a nice ego boost when you learn that <insert big corporation, personal hero or favourite arch-ennemy name here> uses your software, but that's not the point.

      That is indeed not the point. I'm not interested in who or where my code is used. I would like to know THAT it is used, HOW MUCH it is used and on WHICH PLATFORMS it is used.

      "Phoning home" is for closed-source code.

      I disagree. Phoning home without being open about it is for closed-source code. Phoning home and sending personal information is too. Letting know exactly what will happen and giving the opportunity to evade it is IMHO good OO practice.

      I would feel that I would break the trust that users put in me.

      It would break my trust only if it was not done without a way for me to know. But I read READMEs and I do watch installation output. If those don't mention the action, I would hate it.

      BTW, did Larry try to track who was downloading Perl 1?

      Irrelevant. He did in any case get feedback in other ways.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Re: Gathering module usage statistics
by kvale (Monsignor) on May 04, 2004 at 19:17 UTC
    To my mind, fascination with number of downloads is like fascination with XP. Large amounts of both might provide an ego boost, but the correlation of both with quality of product is weak. Even if you could get a reliable count, what would it tell you?

    Personally, I would have to look hard at a module that uploads personal information to the author, before I used it. If you are interested in the breadth of systems that your code is being subject to, try looking into the CPAN Testers. They do smoke tests on a wide range of systems and can provide invaluable feedback.

    -Mark

      I support three templating modules that work with HTML::Template. I also have a very limited amount of time to do open-source work. If I know that Excel::Template, for instance, was downloaded 1500 times in the past month, but Graph::Template was only downloaded 3 times, I know where I'll be putting the few hours I have. And, vice versa.

      Also, if I know that a bunch of people on Darwin are downloading one of my modules, I'll work a little harder to get a Darwin testing platform. But, if I know that not a single VMS user has downloaded it, I won't care so much.

      The other point is that I, as the user, would like this information. A module that's heavily installed has a weak correlation to a module that's heavily used. If it's heavily used, then it's more likely to be actively supported.

      Is it a strong correlation? Probably not. Is it more info than we have? Yes.

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      I shouldn't have to say this, but any code, unless otherwise stated, is untested

        If I know that Excel::Template, for instance, was downloaded 1500 times in the past month, but Graph::Template was only downloaded 3 times, I know where I'll be putting the few hours I have.

        The problem here is that you don't know why Graph::Template was only downloaded 3 times. Speaking purely hypothetically, maybe people looked at the documentation of Graph::Template, decided it sucked, and moved on to something else. But the people looking at Excel::Template (which targets an almost completely different format, and thus likely has a different userbase) thought it was pretty good as it is and use it all the time. In that case, you'd probably want to put more effort into fixing Graph::Template.

        ----
        : () { :|:& };:

        Note: All code is untested, unless otherwise stated

      Even if you could get a reliable count, what would it tell you?

      It would tell me that what I do has meaning.

      Personally, I would have to look hard at a module that uploads personal information to the author, before I used it.

      So would I, but I do not consider the OS name and Perl version personal information.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        When information like perl and OS characteristics are uploaded, the originating IP address is also uploaded. What could one do with this information? You can sometimes discover who owns this machine, starting with the IP.
        • Ah, I know an exploit for perl 5.x.x. I'll bang on this IP to get root privleges.
        • What? He's still using a 2.2 kernel? Wait till the chat forum hears about this!
        • This fellow is using linux? I bet he'd be a good addition to the targetted list I am selling to Microsoft.
        Now, I am not saying that you would do such dastardly deeds, but giving out such information does potentially decrease security.

        If you implement something like this, it really should be opt-in, as people do install using CPAN, sometimes in an unattended fashion.

        -Mark

Re: Gathering module usage statistics
by swngnmonk (Pilgrim) on May 04, 2004 at 20:39 UTC

    The desire for feedback and usage stats on your code makes sense, but I think you need to respect the privacy of your users as well - why should they be forced to tell you that they're using your code? (Unless you're putting that in your License).

    I would suggest you handle this in a passive way - my two favorite examples on doing this are Pine and OpenBSD. In the case of Pine, all you need to do is hit <CR> on a specific field the first time you run it. On OpenBSD, they ask you the following:

    If you wish to ensure that OpenBSD runs better on your machines, please do us
    a favor (after you have your mail system configured!) and type something like:
     # dmesg | mail -s "Sony VAIO 505R laptop, apm works OK" dmesg@openbsd.org
    

    Why not provide a simply perl script in your package, and finish off your installation with a message saying something to the effect of "I would like to hear about people using package XXX, please run the following script to send me a quick email containing your info"?

      I think you need to respect the privacy

      So do I. How do distribution/platform name and version numbers violate one's privacy? The information can only be used for statistics.

      why should they be forced to tell you that they're using your code?

      Please read my original post again.

      I will force nobody and will be very open about what happens. There will be an easy, obvious way to opt out and it will be printed on screen and mentioned in the README.

      dmesg | mail -s "Sony VAIO 505R laptop, apm works OK" dmesg@openbsd.org

      Because I would like to include automated, unattended installations and would like to make sending the information the default. Because the information sent is static and not privacy-sensitive, I think opt-in is not needed.

      Why not provide a simply perl script in your package, and finish off your installation with a message saying something to the effect of "I would like to hear about people using package XXX, please run the following script to send me a quick email containing your info"?

      I have done so in the past. I know for a fact that it doesn't work as well.

      Also, I do not want to hear about PEOPLE using the package, I just want a rough indication of the number of installations. NO PERSONAL INFORMATION WILL BE SENT.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        So do I. How do distribution/platform name and version numbers violate one's privacy?

        The distributions and platforms I'm running in my home is known to me. It is private knowledge. It is not public.

        The information can only be used for statistics.

        You may only be using it for statistics. It can certainly be used for other purposes.

        <paranoid>

        1. I kindly take over maintenance of SOAP::Lite
        2. I add a tracker like the one you propose and log all the information I receive. Note that this information will also include the IP address of the server the module was installed on.
        3. I "accidentally" include an exploit that allows me to call arbitrary Perl code in the next SOAP::Lite release.
        4. I wait for the IP addresses of vulnerable machines to roll in

        </paranoid>

        I will force nobody and will be very open about what happens. There will be an easy, obvious way to opt out and it will be printed on screen and mentioned in the README.

        But an automated install with no manual intervention will opt-in. This is the behaviour I think most people are (correctly in my opinion) objecting to.

        Because the information sent is static and not privacy-sensitive

        I don't think other people are entitled to determine how sensitive my private data is. I guess I'm just funny that way.

        I still disagree. I think that demanding opt-out of your users is inherently disrespectful of their privacy, no matter how innocuous you think the information you're collecting is. It's my damn information, and it's my choice to share it with whomever I feel like, or not. Creating automated response code that's enabled by default is no better than spyware.

Re: Gathering module usage statistics
by jmcnamara (Monsignor) on May 04, 2004 at 20:34 UTC

    I think that it is time that someone sidesteps the argument about whether CPAN statistics can be collected accurately or not and just implements something.

    Statistics can be collected, see this from Graham Barr or the Phalanx 100. There also seems to be some intention of implementing a perl.cpan.stats although it may be vapourware.

    --
    John.

      Statistics can be collected

      I use a geographically nearby mirror, and expect most people do so. gbarr counted only the downloads of search.cpan.org. That excludes ALL automated downloads. As you can see, only the famous ones are in that list.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        I don't think you need the absolute numbers of all downloads --- an impression of the relative numbers of your module compared with the "big ones" should be enough. And I don't think that the number relation between automated CPAN downloads and search.cpan.org downloads will be that different.
Re: Gathering module usage statistics
by mojotoad (Monsignor) on May 04, 2004 at 21:18 UTC
    I'm not a big fan of any kind of "phone home" software unless it is absolutely necessary for the install itself or the functionality of the program (this last would be things like p2p apps, MORPGs, etc).

    In other words, if it is part of the utility of the software, then it is regarded with far less suspicion than anything else.

    It seems to me that the obvious place to collect downloading statistics would be on the CPAN mirrors themselves -- but this would require statistics aggregation module on each mirror that would forward the data along to a central database. Whether that will ever happen, who knows, but that seems to be the logical place to look.

    That way it's invisible, in a non-creepy way, to the end user.

    I personally understand the benign intent of your desire, but the old saw 'perception is reality' applies here.

    Matt

      In other words, if it is part of the utility of the software, then it is regarded with far less suspicion than anything else.

      Why would you suspect my module that says it will send 4 static strings to be any more dishonest than a module that doesn't say anything about collecting data?

      It seems to me that the obvious place to collect downloading statistics would be on the CPAN mirrors themselves

      The CPAN is mirrored to a large quantity of servers, that run a variety of server software. Watching logfiles requires a process to run on the servers and I suspect that most do not even log transferred files.

      This will never happen.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        Why would you suspect my module that says it will send 4 static strings to be any more dishonest than a module that doesn't say anything about collecting data?

        Juerd,

        I never stated or implied that I thought you or your module were doing anything dishonest.

        If you re-read my last statement, I even say that I understand the benign intent.

        I was pointing out that others, some certain percentage of users (we can debate that ratio till the cows come home), will inevitably perceive things differently.

        Matt

Re: Gathering module usage statistics - a compromise solution?
by itub (Priest) on May 04, 2004 at 23:54 UTC

    Having statistics for module installations would be very nice. However, I must agree with most people here that giving users 10 seconds to opt out is not enough. Lots of people go and have a cup of coffee while the installation runs. Like swngnmonk, I immediately thought about Pine. That program lets you opt in and send a message the first time you use it. But you have to say "yes" explicitly.

    I think it wouldn't be so bad if your installer asked "Do you want to send statistics? (n)" and wait indefinitely. Some people wouldn't like it because the installation wouldn't be completely automated anymore, but many modules already do that kind of thing (usually for asking if you want to enable certain features) and we survive. LWP asks you if you want to run some tests which require using the network and whether you want to install certain scripts; the Template Toolkit asks you if you want to install certain libraries and examples; etc.

    I propose a compromise solution:

    Let's all agree on a special environment value, which I'll call PERL_INSTALL_PHONE_HOME in this discussion. Participating modules will agree on the following:

    1. If the variable has the value '1', feel free to phone home the agreed information without asking.
    2. If the value is '0', don't phone home and don't ask.
    3. If the variable is undefined (or has any other value?), ask the user during the installation procedure whether he wants to phone home or not, and explain how setting this variable will avoid such questions in the future.

    CPAN.pm has configuration settings for passing options (such as PREFIX) during compilation and installation, so the user could either set the preference there or define an actual environment variable.

    Of course, this requires cooperation from several parties, but I'm afraid any mutually acceptable solution will. The Perl community is used to developing conventions, idioms, and agreements (such as the standard install and import procedures) so it might be possible.

    This is in spirit like the Robots Exclusion Standard. It works if people follow it. Of course "nasty" modules might not follow it, but any module can do nasty things if it wants. I'm sure the community can police itself to ensure that CPAN modules don't abuse it

      Some people wouldn't like it because the installation wouldn't be completely automated anymore

      Including myself. I don't mind doing something controversial, and would not even mind making this opt-out with no explicit warning. But I will not make it do something that I disagree with myself ;)

      Let's all agree on a special environment value, which I'll call PERL_INSTALL_PHONE_HOME in this discussion.

      I've called it NOPHONEHOME before. Why limit this to Perl? Just in case gathering statistics will become popular, I want to make sure upfront that my environment won't have to have a PERL_NOPHONEHOME, PYTHON_NOPHONEHOME, etcetera.

      I propose that we use NOPHONEHOME and evaluate it as:

      • If it does not exist, the installation program will have to ask.
      • If it does exist and has a value of "" or "0", the installation program will still have to ask.
      • If it does exist and has any other value, the installation program must skip the phoning home part.
      I'm not comfortable with allowing everything to phone home, as I want to see what information they will send before I agree.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        Some people wouldn't like it because the installation wouldn't be completely automated anymore
        Including myself.

        [Raises hand] Indeed. If your module phones home with OS and Perl version, I have no objection. If it requires user intervention during install, I won't use it. I don't need that hassle. The only exception I make to this is for stuff included in Bundle::CPAN, because that's so vital and because it installs first and then I can walk away while everything else gets installed.

        As it happens I don't use your module, but if I did, I wouldn't get counted with the opt-in plan. Just so you know. I almost never sit and babysit CPAN while it's installing modules. Especially not on desktop systems. Sometimes I position the window where I can see the bottom of it peeking out behind some other window I'm working in, but just as often I start CPAN going in a screen session and detach, especially when I'm sshing into a headless or remote system. Is there a security risk? Sure. There'd be a security risk even if I monitored the install closely. I could mitigate that risk by reading the source of every module before I install it, but that would consume a lot of my time; there's a tradeoff involved here, and to date the worst fashion in which I've been "bitten" by a module from the CPAN was when I discovered that a certain module I rely on heavily won't run out of the box under Taint checking. (The module in question is a dependency for DateTime::Format and as such is worth enough to me that I investigated this problem and figured out how to resolve it, rather than dropping the module. Most modules I'd just discontinue using in this situation.)

        As for privacy concerns, there are different kinds of privacy. Information privacy (protecting things like my OS, Perl version, address, ...) is of little concern to me, except for _sensitive_ information (passwords, social security number, and such). I'm not a tinfoil-hat kind of guy, really. But the other kind of privacy, the freedom to not be continually needlessly bothered, is more important to me. I filter spam to the largest extent possible (as long as I don't get false positives; I did away with naive Bayesian filtering because it got false positives, which makes it worse than useless to me; as far as keeping my address private, that would prevent real people from being able to contact me for legitimate reasons, an unacceptable tradeoff), use a browser that suppresses popup windows, don't have a voice phone in my living quarters, and won't use Perl modules that require a lot of babysitting during installation (except, as noted, for really vital things like Bundle::CPAN).

        I'm not comfortable with allowing everything to phone home, as I want to see what information they will send before I agree.

        If a global setting such as you propose were ever implemented, it should have an option for "Don't ask, just do it." That's how I'd set it, but if that option weren't available I'd set it to "Don't ask" and screw your statistics, because I don't want to be hassled. I specifically want to be able to install Bundle::CPAN, reload CPAN, and then set Bundle::Jonadab going and go do something else while another sixty or eighty modules are installed. (Most of those are dependencies, either directly, or through the test suites.) It wouldn't be right to make me sit and watch the scrolling text so I can monkeybang the keyboard every little bit for two hours every time I install a new system or a new Perl version or whatever. That would be far more invasive than sending my OS and Perl version to the module author. It's bad enough I have to configure CPAN and LWP each time; once I get that done, I want the rest to be interruption-free.

        As far as not being comfortable letting "just anything" phone home, a malicious program wouldn't respect my phone-home preference anyway, so if there's code that I don't trust, I shouldn't be installing it at all; if I'm installing your module, that implies that I decided the utility it provides is worth more to me than the security tradeoff of trusting it. So that implies I trust it to be sensible about what data it sends and where it sends it. At some point I'll probably make the mistake of trusting a module I shouldn't, but when that happens I'll deal with it; I choose not to go through life agonizing over such things when the agonizing is fundamentally not even going to significantly mitigate, much less solve, the problem.

        There are things I wouldn't want a module doing at install time; one example has recently come up. But phoning home with statistical data for the author of the module isn't anywhere near that category, as far as I'm concerned. It's nice that you're going to have your module ask for permission, because we know there are some people who object, but as far as I'm concerned, it's fine. As you point out, the risk these data pose is much smaller than the risk I've already accepted by running your code on my systems without personally inspecting every line of it first.

        I'm not saying everyone should be comfortable setting the variable that way in every situation; I'm just saying that if someone were to go to all the trouble to institute a standardized global way to express a preference on this issue, it should provide for the basic standard three options, "Yes", "No", and "Ask"; if a user such as yourself is "not comfortable" with one of these options, then he won't set it that way, will he? "Ask" should probably be the default. It would be nice if there were a way to say "Yes, and set the pref so I'm not asked again" or "No, and set the pref so I'm not asked again", but that would require the existence of a global preference that all software would be expected to know about.


        ;$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$;[-1]->();print

        When I said "feel free to phone home the agreed information" I was thinking (but I neglected saying it explicitly) that as part of this protocol there would be a standard, pre-agreed set of information that may be sent (Perl and module version and platform are good candidates). I think it would be even better if a relatively neutral and central place like search.cpan.org (or something like that) collected all the statistics, instead of contacting the author directly. This way some people might be more likely to trust sending the information without being asked, and people would also be able to compare statistics for different modules and authors. If you are not comfortable with allowing everything to phone home (or rather the CPAN site) automatically, you are still free to leave the variable in the "ask" state.

        I would also suggest developing a standard CPAN module for doing the phoning instead of each author rolling their own. That would make it easier to use, have more flexibility for configuring it to use proxies or whatever is needed, and again it might make some people more likely to trust it.

        I've mentioned trust several times. That is the central problem here, but just remember that whenever you install any program without reading and understanding the entire source code first, you are implicitly trusting the author (hey, if you didn't trust the author you would have written the code yourself!). Maybe there are some other modules that phone home without asking and no one has noticed. It would be good to have a standard way for authors that want to play nice to do this in an acceptable way.

Re: Gathering module usage statistics
by adrianh (Chancellor) on May 04, 2004 at 21:33 UTC
    It would explain what it is about to do and then wait ten seconds so it can be terminated if someone doesn't like it. I don't want it to default to not-sending the information, because most installations are done with CPAN or CPANPLUS.

    I would dislike this default behaviour. In my eyes (and others can feel free to differ :-) having an installation default to communicating with another site without getting active confirmation from the user isn't acceptable behaviour. It breaks the contract of what I expect modules to do.

    I've no real objection to the information being gathered, but I want to be asked first. The information is private to me and there should be a manual opt-in, not an automated opt-in IMHO.

    (as an aside you might be interested in cpanstats.)

      I've no real objection to the information being gathered, but I want to be asked first.

      I expect users to read the README file and to watch on screen information. More than enough time will be given to abort the process. You are asked first, but the default will be "I agree".

      The information is private to me

      Why is it private?

      Webster's describes private as:

      1. Belonging to, or concerning, an individual person, company, or interest; peculiar to one's self; unconnected with others; personal; one's own; not public; not general; separate; as, a man's private opinion; private property; a private purse; private expenses or interests; a private secretary. 2. Sequestered from company or observation; appropriated to an individual; secret; secluded; lonely; solitary; as, a private room or apartment; private prayer.
      None of these descriptions apply to the perl version and platform name. Unless anyone has any reason to keep those things secret. In which case I would *still* like to know why. How would me or anyone in the world knowing those things harm you? Can they be used against you? Can you be identified by them? Did you hack your SSN into $^O?

      (as an aside you might be interested in cpanstats.)

      Cpanstats is a waste of time. With only 80 systems reporting (again I say: opt-in does NOT work) the information gathered is useless.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        I expect users to read the README file and to watch on screen information. More than enough time will be given to abort the process. You are asked first, but the default will be "I agree".

        Automated assumption of a "yes" isn't me agreeing.

        If I hit return to start installation in CPAN and are then immediately struck dead by a heart attack your software will send the information. How did I agree to that?

        To pick a (hopefully) more common scenario. I hit return, see the message and my dialup connection dies. It takes way longer than 10 seconds for me to reconnect and ssh to some of the boxes I administer.

        Why is it private? ... None of these descriptions apply to the perl version and platform name.

        Yes they do.

        What version of Perl running on the Linux box in my office upstairs? You don't know because the information is:

        Belonging to, or concerning, an individual person,         company, or interest;... personal; one's own; not public ... Sequestered from company or observation; appropriated to         an individual; secret
        How would me or anyone in the world knowing those things harm you?
        1. The question of harm is separate from the question of privacy. I do not want every piece of private information I possess made public just because it won't do me any harm. We're not living in the transparent society quite yet :-)
        2. As I've already pointed out there are potential scenarios where somebody knowing these things can cause me harm.
Re: Gathering module usage statistics
by greenFox (Vicar) on May 05, 2004 at 06:06 UTC

    I have a different solution for you which I think is more respectful of the user and will probably result in more feedback for you. In your README file and at the bottom of the documentation include a request for feedback. Something like:

    "If you like this software please send me a message at <your email> letting me know you are using it. Your feedback will help me in developing this and other modules in the following ways (list them). Inclusion of the following details would also assist me (list them). To make this easy for you I have included the script tell_me_you_care.sh which emails the following information (list again) to me. Thanks for using xxxx, Juerd."

    This way you have let the users know you care about them, the security concerns are minimised, and you have given people who are installing the software on boxes that are not connected to the internet a way of reaching you as well. Personally I think this is a much more polite and respectful way of going about it, give people a reason to opt-in and I believe that they will. I can tell you that I never allow software packages to remote connect (if I get a choice) but I have registered postcard-ware software and sent "fan mail" to authors of software that I like.

    --
    Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho

      Please consider reading the entire thread. I am convinced that opt-in does not work.

      I already get some fan mail, and some bug reports. Although I like that, it is not the kind of information I'm after.

      Besides that, email is too bloated for this. I don't want someone's email address, I don't want to know which MTA they use and which SMTP routes the message followed.

      But most importantly, I don't want the user to have to do anything more than hit Enter to send me this information, because anything harder than that will result in many, many less counts.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        Please consider reading the entire thread. I am convinced that opt-in does not work.

        Well I did but you have an opt-in system any-way "Opt-out doesn't seem to be very popular". And it does not give the user any good reason to opt-in. And it does not cater for boxes behind firewalls etc. And it breaks standard automated builds. And it raises security concerns for people.

        Besides that, email is too bloated for this. I don't want someone's email address, I don't want to know which MTA they use and which SMTP routes the message followed.

        Fair enough make the tell_me_you_care.sh script do an put on your website then :)

        But most importantly, I don't want the user to have to do anything more than hit Enter to send me this information, because anything harder than that will result in many, many less counts.

        Well I respectfully disagree, I think many people will not opt-in for the reasons I have already given. I think you need to stop thinking about what you want and think about what the user wants- yeah they can be the same thing but a random user downloading from CPAN isn't going to see that, all they are going to see is Big Brother watching.

        --
        Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho

Re: Gathering module usage statistics
by jacques (Priest) on May 05, 2004 at 02:36 UTC
    CPAN is mirrored, and there is no way to know how often your module distribution has been downloaded and no way can easily be made.

    So create a program that does download statistics on one of the mirrors. Then gently suggest that other mirrors use this program too. Eventually we could have one site, say search.cpan.org, show the statistics that are gathered from all of those mirrors that use the program. It may not give a complete picture of usage statistics, but it would be one indicator and certainly an interesting one as well.

      Then gently suggest that other mirrors use this program too.

      They will not do it. Mirrors exist because mirroring is easy. Make it any harder, add any communication, and the maintainers will more likely choose to no longer mirror.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        That's a negative thought. One way of seeing if a thought is true is to do an experiment. Install the program on one of the mirrors, work out the bugs, and then ask a friend who manages another mirror to use the program. See what happens.
        A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Gathering module usage statistics
by flyingmoose (Priest) on May 05, 2004 at 18:32 UTC
    I'd like to see this implemented generically (in CPAN.pm as an option) rather than on a per-module basis. Look at the Debian Popularity Contest for a system that tracks usage of Debian modules. This is also opt-in, but I'd rather the CPAN version not be implemented as a daemon. PopCon appears fairly successful, but yeah, I'm not running it on my box -- so take that as you will. Definitely some good examples to read up on though.
Re: Gathering module usage statistics
by Abigail-II (Bishop) on May 07, 2004 at 23:36 UTC
    If I were to encounter a module doing that I'd most likely make sure I never install any module from its author again (considering that I install from a local CPAN mirror, that's as easy as chmodding a single directory) - except perhaps for figuring out where it send the information to (and in which format) so I can feed it useless information.

    If you want such information, ask for a postcard in the README.

    Abigail

      If I were to encounter a module doing that

      Doing *what*, exactly? My original proposal (opt-out) or the updated one (opt-in)?

      If even opt-in is unacceptable for you, I advise you to chmod in advance.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        Doing *what*, exactly? My original proposal (opt-out) or the updated one (opt-in)?
        Either.

        Abigail

Re: Gathering module usage statistics
by Anonymous Monk on Jun 01, 2004 at 05:36 UTC
    I would LOVE to have access to this stuff, for many of the same reasons. Knowing what to support, knowing how safe it might be to change an API ( pretty safe if nobody uses it ).

    But on the subject of implementation, I don't understand why everyone is having such difficulties coming up with a solution.

    Just put an entry in wherever we keep the other CPAN configuration and do an always|never|ask config entry that starts with 'ask' by default.

    (During CPAN.pm first setup) To better understand how often and where modules are being used, so they can be better supported, CPAN.pm can allow modules the option of reporting back ONLY your platform (Linux) and perl version (5.005_03) whenever that module is installed. These support statistic are aggregated centrally, and no other information about you is recorded. It would be greatly appreciated if you could report your use to help us make CPAN better for you. Report usage and platform statistic (ASK/always/never)

    If you always want to report, then it's done transparently in the back on any install. If you never want to report, we just never do it. If set to ASK ( the default ), you would get.

    This module would like to report back usage and platform statistics containing only your platform (Linux) and perl version (5.005_03) to the CPAN statistics server at stats.cpan.org. This support statistics are used to help the developers which modules and platforms need the most attention paid to them. Report support statistics for this install? (YES/no/always/never)

    This follows a couple of basic principles.

    1. Default to reporting
    2. Never actually report without approval
    3. Make it trivially easy to permanently disable at any time
    4. Make it easy for people to report.

    In some theoretical forced-non-interactive run ( can we do this? ) we would just not report for that run, and ask next time we did an interative installation.

    This covers both the "just fuck off and die" and the "I'd love to help, as long as I don't have to do anything" cases quite cleanly.

    Putting the stats stuff into CPAN.pm is good enough for now. I guess we could theoretically allow the database to record statistics in several different catagories... so you could write a mirror log stats interface to the same stats system? Or possibly ever hook into other types of things, like debian mirror logs etc. Anyway, I'm drifting into speculation land here.

    Comments?
      This seems fair enough. By the way, if you personally want to know about/from people using your module, why not have a mailing list? People subscribing to it won't be gimmick installs anyway, and I'm pretty sure they'll tell you what platform/version etc. they're using.