Hello

I created a module that, for the moment, is called Date::Iterator, and I am discussing about the opportunity of putting it on CPAN. This node, as my latest one, arises from a discussion on the module-authors list. All started here, and my first bunch of conclusions, form which I'll borrow some text, is here.

I have seen "CPAN pollution" used in a context where it meant that you have many modules that do the same thing

I personally don't agree with that definition

When you have many modules that do the same thing, you probabily have some of poor quality and some good. Should the "bad" ones be stopped before they reach CPAN? Or shouldn't we take advantage of the newly added rating feature for CPAN modules to tell the good from the bad ones?

If we had a "sort by rating" on search.cpan.org that could be, IMHO, a solution. Besides, there are good reasons to require a registration before you can rate a module, but the tradeoff is that not many people are rating modules at this time... maybe anonymous voting should be allowed, possibly stating how many anonymous votes a module received (e.g. My::Module: rating ****, 123 votes out of 234 were anonymous)

The possibility of rating a module should also be advertised by the CPAN module itself after a successful installation (or does recent versions already do it?).

The really bad thing about having many modules to do the same thing is that probabily those module authors could work together to bring a unique, powerful module to the community (the efforts diluitions that mirod saw in the mailing list). But when it doesn't happen, shouldn't be the community itself to choose?

Ciao!
--bronto


The very nature of Perl to be like natural language--inconsistant and full of dwim and special cases--makes it impossible to know it all without simply memorizing the documentation (which is not complete or totally correct anyway).
--John M. Dlugosz

Replies are listed 'Best First'.
Re: What do we mean with "CPAN pollution"?
by Ovid (Cardinal) on Dec 23, 2003 at 17:23 UTC

    bronto asked if "bad [modules should] be stopped before they reach CPAN?"

    I would think the answer to that is a resounding "no"! While I won't name any, I do agree that there are some pretty awful modules out there. However, the CPAN has two good things going for it. First, there is competition amongst modules for different ways of doing things. Second, it's pretty "hands off", which allows new ideas (even bad ones) to be explored.

    What constitutes a bad module? When tachyon posted an RFC about a competitor to CGI.pm, many people seemed to suggest to him that he shouldn't upload his module, but they obviously had not looked at it. It's a great module, serves a need, and is a worthy competitor to the CGI.pm module. Had people just gone with their gut reactions, we may never have seen this module and the CPAN would have been a poorer place because of it.

    On the other hand, what about older modules that are "bad"? Some feel that File::Find::Rule and File::Finder are better alternatives to File::Find, but no one is suggesting that File::Find be pulled down (or that Data::Dumper be removed in favor of YAML). Maybe some would argue that Pixie eliminates the need for Class::DBI, Tangram and others. It's certainly easier to use, but it does not mean that those other modules don't have their place.

    I don't want to take away the competition of the modules on the CPAN. The current rating system is interesting, but even though many people write to me and tell me that HTML::TokeParser::Simple is much easier to use than HTML::TokeParser, I've yet to be rated on it and I don't if that really matters. Consider that if a replacement module comes along, gets little advertising, perhaps people would still use the original module because it has plenty of "good" reviews. The latter example highlights the problem: there is no substitute for a programmer's judgment. Popularity ain't everything :)

    Cheers,
    Ovid

    New address of my CGI Course.

Re: What do we mean with "CPAN pollution"?
by hardburn (Abbot) on Dec 23, 2003 at 14:35 UTC

    To me, CPAN Pollution is when people upload their module without sufficient thought put into its distribution. Lacks documentation, little or no test scripts, poorly-defined dependecies, or (as you mentioned) replicates functionality that has already been done. At times, this overlapping functionality is fine, since the authors may not approach the problem in exactly the same way, thus making one better than the other in certain situations. Other times, the two are nearly identical, which is just annoying (how many tied hashes implementing locked keys do we really need?)

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

      Forty-two. See, you knew it was the answer to something.

        I always knew that CPAN was the answer to life, the universe, and everything.

Re: What do we mean with "CPAN pollution"?
by pg (Canon) on Dec 24, 2003 at 01:08 UTC
    "you probabily have some of poor quality and some good."

    It is not always clear whether a module had bad quality. Not all the people use the same standard, and even for the same people, he/she doesn't always use a consistant set of standards.

    "Should the "bad" ones be stopped before they reach CPAN?"

    They should be stopped, if we know which one is "bad", but do we know all the time? do we always agree with each other?

    "Or shouldn't we take advantage of the newly added rating feature for CPAN modules to tell the good from the bad ones?"

    Well look at the XP point system here.

    However it is not hopeless, stay with this site long enough, you will notice that some modules being repeatedly positively mentioned, and probably you should give those a try.

Re: What do we mean with "CPAN pollution"?
by perrin (Chancellor) on Dec 24, 2003 at 05:56 UTC
    Having lots of modules that duplicate each other's functionality can be a problem in that new users have trouble finding the best ones and waste time, possibly also getting frustrated with CPAN and going sour on the whole idea of sharing code. Think about the torture awaiting a person who types "template" into the CPAN search engine.

    Unfortunately, the metrics you talked about are not very comprehensive at this point. Maybe some day we will have enough votes for them to be more useful. Meanwhile, we have ad hoc attempts, like my article on templating systems. Several people have asked me to add other modules to this document, but in most cases I have to refuse, since the whole point of having this sort of document is to steer newbies toward the cream of the crop. Experienced people who have something specific in mind can handle examiing the suff on CPAN and deciding for themselves.

    One thing that may help is the work that Leon Brocard did on CPANTS. He made some first steps toward automatic evaluation of modules based on some simple statistics. This could eventually be very helpful in fiding the best modules.

Re: What do we mean with "CPAN pollution"?
by xenchu (Friar) on Dec 24, 2003 at 05:16 UTC

    It's all Darwin, survival of the fittest. Over time, the best modules are most likely to survive. Programmers will communicate and mention those modules they find most useful. When others need a module to do a specific task they are most likely to choose those they remember having heard the most positive things about. Evolution in action.

    I have used the word most more than once(by design, really!). Programmers are people and people can be fickle and illogical at times. Sometimes a module will be popular though there may be better modules doing the same work. A situation somewhat like Betamax versus VHS. A rating system does little good in such cases because ratings in the final analysis are a measure of popularity. Does such popularity make a module fit? No, but it does make it a survivor.

    The point I am laborously trying to make is that pollution can be hard to get rid of. I believe the best solution is to let the module users select the modules that are best. Not a perfect system but none is and better than selection by committee (or, god help us, management).

    xenchu


    The Needs of the World and my Talents run parallel to infinity.
Re: What do we mean with "CPAN pollution"?
by Abigail-II (Bishop) on Dec 25, 2003 at 01:12 UTC
    Your article seems to rest on several fallacies:
    • One can linearly classify modules on a "good" to "bad" scale.
    • There's an agreement on what "good" modules and "bad" modules are.
    • CPAN has something to do with quality or quality control.
    • Quality has something to do with popularity.

    I've said it before; CPAN is just an archive. A storage device. An equal opportunity storage device, which doesn't discriminate between "good" or "bad". About the only restriction is that you upload stuff that's freely distributable, and that you play nice.

    If you want to rate CPAN modules in some way: create a website, and start writing reports. Spice it up with votes, karma, experience, bronze/silver/gold stars, or whatever. Perhaps someone else is doing some effort already - join them.

    Don't come up with things you'd like the CPAN people to implement - do some work yourself.

    Abigail

Re: What do we mean with "CPAN pollution"?
by Anonymous Monk on Dec 24, 2003 at 05:52 UTC

    Okay here's the question:

    If you're writing some Perl code, do you assume the input will always be in the proper form? No, of course you don't, so why should the system to distribute that code be any different?

    The problem is in the CPAN moderation system (or lack thereof), not that people are submitting sub-standard code to it. Fix the system, don't worry about the incoming data.