I've been known to decry the use of databases for certain applications. In this meditation I am going to go against that reputation by advocating their use.

CP6AN should be a database.

So too, should the Perl6 equivalent of perl/lib and perl/site/lib.

Currently, each package on CPAN is an indepentant entity. Whilst this has some advantages, it also results in a huge amount of duplication with respect to all the support scripts and ancillary files.

Each package is charged with providing a makefile generation script that in turn has to 'discover' the environment into which it is being installed and generate an appropriate makefile for that environment.

As most developers do most of their development on one (or maybe two) OS, it becomes the responsibility of of modules like ExeUtils::MakeMaker, Module::Build etc. to produce highly ambidextrous makefiles. The result it that every make file is hugely complicated, supporting those modules requires the skills of a very rare breed of developer who has access and knowledge of multiple OSs. My thought is that when Perl6 is installed, a generic module makefile could be generated for the installation. From that point on, the process of intalling new modules would be:

  1. Query CP6AN (possibly using SQL), to located the module/class required.
  2. Replicate the selected records from CP6AN into the local LIB database.
  3. Invoke the locally tailored, generic makefile to perform any compilation steps required.
  4. Done.

This would allow for versioning, transactional control (rollback), backups, etc. Basically, all of the good things that databases provide. It would also vastly reduce the size of the mirrors by removing the need for all the duplications that currently exist. I could even imagine the day when individual installations could register with their local mirror for update replications etc.

As programmers, we tend to see the benefits of using databases for the storage of all manner of data. Why not our own, in the form of source code?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.
Rule 1 has a caveat! -- Who broke the cabal?

Janitored by Arunbear - retitled from 'A notion for the weekend.', as per Monastery guidelines

  • Comment on Notion: CP6AN <strike>should</strike> could be a database

Replies are listed 'Best First'.
Re: CP6AN should be a database
by adrianh (Chancellor) on Apr 08, 2005 at 16:26 UTC
    it becomes the responsibility of of modules like ExeUtils::MakeMaker, Module::Build etc. to produce highly ambidextrous makefiles.

    One of the major plus points of Module::Build is that it gets rid of all that tedious mucking about with makefiles. It's all lovely cross-platform pure perl :-)

    As programmers, we tend to see the benefits of using databases for the storage of all manner of data. Why not our own, in the form of source code?

    I store my source code in subversion. Would an SCM be a more useful thing to think about rather than the more general "database"?

      I store my source code in subversion. Would an SCM be a more useful thing to think about rather than the more general "database"?

      Interesting you mention that, because FreePAN is doing just that. Each contributor gets their own Subversion repository. I'm not sure if it's been "officially blessed", but FreePAN seems to be a way forward for a CPAN in Perl6.

      Update: looking at the list of repositories opened so far, one might conclude that it has been de facto blessed.

        FreePAN is not a replacement for CPAN. The first is a mirrored set of source control repositories, and the second is a mirrored set of distributions.

        From the FreePan FAQ:

        FreePAN does not intend to replace or even compete with CPAN. In fact in intends to cooperate with it fully. FreePAN is about mirrored Version Control repositories. CPAN is about mirrored packaged content. The two efforts nicely compliment each other.

        Source control is a different beast than release management. :)

        --
        brian d foy <brian@stonehenge.com>

      I confalted two different situations in my post.

      There is the development time usage of the source code for which SCM software is the database of choice.

      But there there is the usage of those source files for which we currently fall back upon lots of little codependant, parallel files.

      This seems suboptimal to me.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco.
      Rule 1 has a caveat! -- Who broke the cabal?
        But there there is the usage of those source files for which we currently fall back upon lots of little codependant, parallel files.

        Probably me having one of my dim days - but I don't understand what you're getting at here.

Re: CP6AN should be a database
by rir (Vicar) on Apr 08, 2005 at 18:03 UTC
    A simple file is beautiful. By using files to hold our Perl stuff we remove one barrier to the use, examination, or alteration of it.

    By solving the problems associated with using such files we solve a very general problem.

    Your listed installation process could be as easily applied to a file-based system.

    The difficulties go past differing OS environments to differing installer, that is login, environments.

    Be well,
    rir

      A simple file is beautiful.

      I agree, there is nothing more beautiful than simplicity, when simplicity is all that is required.

      However, take a good look at a moderatly complex distribution and note that modules are anything but simple. Look the not untypical set of ancillary files involved:

      ANNOUNCE ChangeLog Event.h Event.xs INSTALL MANIFEST MANIFEST.SKIP META.yml Makefile.PL README TODO Tutorial.pdf c/ ev.c generic.c group.c hook.c idle.c io.c queue.c signal.c tied.c timeable.c timer.c typemap.c unix.c var.c watcher.c demo/ echo.t group.t msg.pm perlqt.t process.pm queue_pending.t rand_interval.t readline.t repeat.t semaphore.pm lib/ Event.pm Event.pod Event/ EventAPI.h MakeMaker.pm Watcher.pm generic.pm generic.pod group.pm idle.pm io.pm signal.pm timer.pm type.pm typemap var.pm ppport.h t/ attach_to.t bored.t callback.t data.t delete.t eval.t fifo.t generic.t group.t hook.t hup.t idle.t idle2.t io.t leak.t leak2.t loop.t now.t reenter.t signal.t timeout_cb.t timer.t unconfigured.t var.t util/ bench.pl filehandle.txt

      I can't think of any other situation where developers would not reach for the benefits of a database to manage access to this amount of data and meta information.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco.
      Rule 1 has a caveat! -- Who broke the cabal?

        What's not simple about that directory tree? Files are segregated into directories by their purpose or type, and we can look at all the files. MANIFEST is already a tiny database. Why would I store that in something just to pull it out again? MANIFEST.SKIP is similar. What structure is going to make that more simple? At least with files I can list them all, grep them, edit them, or anything else I might have to do.

        Your proposal doesn't make that situation any better. From what I have read, you want to replace Makefile.PL with some other file. The rest of the distribution looks the same, and the at the user level most people will not notice a difference.

        --
        brian d foy <brian@stonehenge.com>
        I can't think of any other situation where developers would not reach for the benefits of a database to manage access to this amount of data and meta information.

        Given that the information is pretty much static there is little need to add complexity.

        If a database system is used it will be another barrier to contribution to C6PAN (and, I think, CPAN, for they must become one). As a module user anyone can jump into the code in $PERL5LIB and try to figure out what was going wrong/right. Put that into a database and you exclude those that have not learned to deal with your database.

        In this I have used the term database as I think you mean it but I also view /usr/lib/perl/ as a database.

        Be well,
        rir

Re: CP6AN should be a database
by jZed (Prior) on Apr 08, 2005 at 16:34 UTC
    Hmm, coincidence, just a few nights ago I got one of those early-morning-not-awake-yet ideas - DBD::CPAN. The DBI/SQL end would be a no-brainer, just subclass DBD::File. I don't have time to do it myself, but if anyone wants to work on this, I can help, just holler.
Re: CP6AN should be a database
by brian_d_foy (Abbot) on Apr 08, 2005 at 22:40 UTC

    You aren't going to be able to store a installation file (makefile, build.pl, whatever) for every situation. That's a really hard problem that nobody has figured out, and once you do, some OS or library will change slightly, need special treatment, and muck it all up.

    I'd much rather have something like ExtUtils::MakeMaker or Module::Build that knows how to do the right thing on my system. Anything you do will still have to look at my own Config.pm because I install my own Perl, and I don't have to tell anyone but my system where and how I did that, which configuration options I choose, and where I want modules to end up. Something still has to take all of that information and put it into the installation program.

    Not only that, but I want to turn things on or off when I install things. I often use PREFIX, and less often I choose to give Makefile.PL values for other options. Your solution will have to handle all of those as well.

    There is a problem with having only one version of a module installed because the module name maps directly onto the filesystem. A database won't fix that, though. We have to store the files different, but still make them available as plain files so we can look at them, grep them, and whatever other things we typically do with plain files.

    Once you think about everything that has to happen, you're pretty much back to something like MakeMaker. You might not like it, and you may think its ugly, but you need to give credit where credit is due. People (not just Perlers) have been using make(1) for decades, so you want to be a bit cautious before you throw all of that experience away.

    Besides that, I don't see much use for a database to query module names. People might use that with CPAN Search or Kobes Search, but the average user just wants to install the module he wants. I don't see finding modules as a big problem. Anything that you do will have to be completely replicated in a CPAN mirror so people can still use it offline (my most frequent place to install modules is in an airplane, for instance). I don't want to keep around hundreds of installation hints files for that, but I don't want to lose the ability to install something at a client site either.

    A database won't provide for versioning, transactions, or backups either. It can do that for the data it contains, but it only stores information about the model we give it. To solve those problems, we need to affect the way Perl installs and stores modules and how it decides which one to use. That problem lives outside a dataase, and its a problem of process, not implmentation.

    Perl 6 is already thinking about this sort of stuff, and it was a concern early on. I'm not sure which solution they are going to come up with (they have time :), but I hope they don't start with a technological idea first then shoehorn everything into that idea. I really hope they don't use some gatekeeper/bottleneck that controls access to the information.

    --
    brian d foy <brian@stonehenge.com>
Re: CP6AN should be a database
by CountZero (Bishop) on Apr 08, 2005 at 21:11 UTC
    As you say in a later post, there are indeed two issues here, the first being the storage system of the modules, the second the use one makes of the files ones they are out of the CPAN system and stored on your local system.

    I don't think it is such a bad solution to use the file-system as its storage system. After all the modules are basically just files and unless you really want to change the whole module system as it is implemented on your local Perl installation, I don't see what is wrong with it. Versioning is one issue (perhaps), but that can be handled within the file-system by adding a version number somewhere in the file-name (or perhaps rather in the parent-folder name) or in a 'version'-file somewhere in your module tree.

    And of course one should not forget that once the 'raw' files are transformed into perl-modules by the magic of make they are (and must be) simple folders or files, unless you want to change the whole of the module storage concept in Perl.

    By sub-contracting as it were, the storage of the 'raw' files and the 'finished' modules to the local filesystem, Perl solves the storage problem in a very elegant, transparant and simple way, which works on all OS'es (as long as they have a file-system with a directory structure of course) without any need to install some OS-dependant program, driver, ... to do this.

    Your solution would call for a database system which must work on all types of OS'es for which there is a Perl-installation and finally you store the results after make in the local filesystem anyhow. I'd say Better the devil you know ...

    The second issue you raise (one locally installed generic makefile, optimised for your local Perl during installation of Perl6) would shift the burden of writing 'this works everywhere'-makefiles away from the module-writers to the Perl6-makers. I confess that indeed the hurdle of having to write such a makefile has until now stopped me from contributing to CPAN as I cannot test my modules or the makefile on anything other than WIN32-systems and in any case the makefile syntax looks like deep and black magic to me.

    However, I think that such a generic makefile will become the next stumbling stone as it will be difficult to guess what (future) modulewriters might require it to be able to do. The fact that writing a makefile is so difficult is perhaps because it is so powerful (I never really tried my hand at it, so I'm just guessing here) and the syntax is so rich and arcane since it must follow this. And whatever this generic makefile's strength might be, it will still have to get some instructions from the 'raw' module files, so from writing a makefile, we go to writing some sort of a config-file. I'm not so sure that that will be easier.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      You really shouldn't let the makefile issue keep you from contributing to CPAN. No one is going to shoot you if you mess things up, and the CPAN Testers let you see what happens to your distro on other platforms (and sometimes even help you figure out why).

      You really don't write the Makefile. You make a small, simple file called Makefile.PL. Check out this Makefile.PL from Andy Lester's List::Cycle. Surely you could create that :) Once you have that, ExtUtils::Makemaker does the rest. You don't have to know anything about Makefiles, really. Module::Build is the same: you create a short description file called Build.PL. When you run that file, the module makes everything it needs.

      If you need help creating a distro, there are plenty of things to get you started, such as Module::Starter so you don't have to do all work on your own. I've also written a couple of articles about such things for TPJ, and those should be listed on my website.

      If no one wants to help you make a distro, or you don't want to look foolish in public, I'll answer your questions on distros in private email. [Offer valid for CountZero on Perlmonks: void in Alaska, Hawaii, and Gary, Indiana). :)

      --
      brian d foy <brian@stonehenge.com>
      Your solution would call for a database system which must work on all types of OS'es for which there is a Perl-installation
      DBI::PurePerl + SQL::Statement + a pure-perl DBD should work anywhere that perl works. Whether or not that fits the definition of a database system is another question.
Re: CP6AN should be a database
by Anonymous Monk on Apr 08, 2005 at 18:02 UTC

    Databases support versioning? What planet do you live on?

    Maybe you're talking about the Subversion database?

      What planet do you live on?

      One where people have a modicum of imagination?

      select code from modules where function eq 'sort' and type eq 'in-place' and version >= 3.7;

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco.
      Rule 1 has a caveat! -- Who broke the cabal?

        That's not versioning, it's a query.

        I thought database versioning was one of the "classic hard problems" with databases. Have you found a magical solution to this problem that the rest of us have missed? I look forward to your "modicum of imagination" on this topic.

Re: Notion: CP6AN <strike>should</strike> could be a database
by g0n (Priest) on Apr 14, 2005 at 14:00 UTC
    I can see strong arguments for making C6PAN a database for the reasons you give.

    /perl/lib on the other hand - please no! Reasons:

    • At the moment, if an error occurs in a module as a result of me passing a duff parameter, making a mistake in syntax etc, I type  vi, cut and paste the filename from the error message, and see the code straight away. Putting it into a database would significantly increase the overhead of that (start database client, spool to file, query, exit, read file....)
    • Having some sort of database dependency (apart from potential cross platform problems) would potentially increase the footprint of a perl installation - which would present yet another obstacle to perl on small devices like PDAs.

    g0n, backpropagated monk