j3 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to understand the differences between modules, distributions, and bundles on CPAN -- and also a few other tidbits about how the whole CPAN enchilada works. I've looked at the cpan faq and also How to make a CPAN Module Distribution, but I'm still not getting it.

• I was just looking at Mailtools, and I think that that's referred to as a "distribution". Correct?

• Is a "bundle" the same thing as a distribution? My guess is "no", and that bundles just exist for use with tools like CPAN and CPANPLUS.

• Via ftp, I just had a look in /modules/by-module/Bundle. Why do a large number of the modules in there start with the name "Bundle-" but many do not?

• Incidentally, when I go to search.cpan.org and click that link on the left that says "Bundles (and SDKs)", why don't I get that massive list I see in /modules/by-module/Bundle? Isn't CPAN search trying to show me that list here?

• If "distributions" are indeed separate and distinct from bundles, where can I go to browse them? I see that search.cpan.org has a drop-down box to select that I want to *search* for distributions, but I'd also like to *browse* them. I see on the main cpan page, I can browse modules, scripts, something called "binary distributions" (or "ports"), etc., but no distributions.

  • Comment on CPAN: modules, distributions, and bundles

Replies are listed 'Best First'.
Re: CPAN: modules, distributions, and bundles
by gaal (Parson) on Nov 03, 2006 at 22:15 UTC
    A distribution is generally a unit of installation. It may contain several modules, where perhaps only some are used at runtime by your code. Modules are generally units of functionality, and the author of a module distro may have decided that some set of functionality is so closely related (and maybe have so much common code) that it's worth distributing all in one piece.

    A bundle (now also called Task) is a convenience "metadistribution" that triggers, via the dependency mechanism of CPAN/CPANPLUS, the installation of more that one distribution. This is typically used for optional but common functionality. For example, you can install the CPAN shell itself without any readline support, but commonly you'd like that and all the other goodies, so the one command "cpan Bundle::CPAN" will save you some pick-n-choosing.

    Another example is the Pugs smoke tests. Those are completely optional—all you need to install Pugs is basically a reasonably recentish Perl 5—but if you do want to run the nice graphical smokes, Task::Smoke makes it easy to fetch and install anything you need, despite not containing any code of its own.

      A distribution is generally a unit of installation. It may contain several modules, where perhaps only some are used at runtime by your code.

      Is a single module distributed via CPAN generically referred to as a "distribution" per se? Or is that term specifically only applied to cases where a minimum of 2 or more modules are distributed bundled together?

      When you upload an archive to CPAN, are you offered a choice between uploading either a "module" or a "distribution"?

        Perl 5 the language doesn't strictly have the concept of a module: it has a "package", and even that bears only somewhat weak conventional overlap with a single file. A module on the other hand seems to be, by conventional usage, the unit of consumption by code. Since a distribution almost always also comes with additional files such as Makefile.PL, documentation, a changelog, metadata regarding prerequisites and licensing etc., and since CPAN attempts to make it easy to manage these units of software, you upload distributions, not modules, even if your distro has a single .pm file in it.

        (Incidentally, Perl 6 does have a "module" keyword, which encapsulates namespacing just like packages do, but adds versioning to the story. No doubt software will continue to have private namespaces not meant to be consumed by the user code directly, and perhaps the author of those would choose to use modules and not packages to contain them, but I'm betting programmers will continue to loosely talk about "installing that module" when in fact more than one may have been involved. (Then again, Perl 6 has lexically scoped namespaces, so if in the middle of my module I decide I need to implement a Parser, I can do "my package Parser {...}" and my stuff will not clash with any Parser namespace that was visible to you, where you had called me.))

Re: CPAN: modules, distributions, and bundles
by randyk (Parson) on Nov 04, 2006 at 01:44 UTC

    Via ftp, I just had a look in /modules/by-module/Bundle. Why do a large number of the modules in there start with the name "Bundle-" but many do not?
    A distribution may contain a number of modules that are unrelated to the distribution name. For example, one of the distributions listed in /modules/by-module/Bundle is Apache-ASP; the reason that appears there is that this distribution contains the Bundle::Apache::ASP module.

    Incidentally, when I go to search.cpan.org and click that link on the left that says "Bundles (and SDKs)", why don't I get that massive list I see in /modules/by-module/Bundle? Isn't CPAN search trying to show me that list here?
    In general, the entries in the categories on the search.cpan.org main page come, in part, from a category selection made by the author at PAUSE when the author registers and subsequently uploads a module. If such a category isn't chosen, it won't appear in the chapterid field of /modules/03modlist.data index file on CPAN, nor when one browses CPAN by category.

    Having said that, search.cpan.org adds, in some fashion, distributions to the existing categories that CPAN already has. However, the algorithm used is different than one gets by browsing by module.

    If "distributions" are indeed separate and distinct from bundles, where can I go to browse them?
    If you go to a distribution on search.cpan.org (for example, see Apache-ASP), follow the Browse link to examine files in the distribution.

      Via ftp, I just had a look in /modules/by-module/Bundle. Why do a large number of the modules in there start with the name "Bundle-" but many do not?
      A distribution may contain a number of modules that are unrelated to the distribution name. {snip}
      Right. But I was just curious about the file naming convention here. Maybe there used to be a convention to name all bundles starting with that text "Bundle-" but then folks just stopped doing it...
      If "distributions" are indeed separate and distinct from bundles, where can I go to browse them?
      If you go to a distribution on search.cpan.org (for example, see Apache-ASP), follow the Browse link to examine files in the distribution.

      Sorry. I may have been unclear. I meant: how can I browse CPAN itself to see a listing of all the distributions there, rather than just looking at the listing of modules.

      My guess is that most packages on CPAN are single-module packages, and that there far fewer multi-module "distributions". I'll have to look around there, download some more, and look at their insides to get a better idea about this.

      Thanks!

        I meant: how can I browse CPAN itself to see a listing of all the distributions there, rather than just looking at the listing of modules.
        The index file modules/02packages.details.txt on CPAN contains a mapping of all modules to distribution names.

        My guess is that most packages on CPAN are single-module packages, and that there far fewer multi-module "distributions".
        That's an interesting question. Of the 12,166 distributions listed today in 02packages.details.txt, 8592 have only one module associated with them. However, some of these single-module distributions are older versions of distributions whose current version contains multiple modules (eg, look at Slauth-0.01.tar.gz and Slauth-0.02.tar.gz). This sometimes means that a module in an older distribution has been replaced, or perhaps been made obsolete, by another module in a newer distribution, but it still shows up in 02packages.details.txt because it's still registered with PAUSE.