agaffney has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a Perl version of Gentoo Linux's Portage package management tool just for the heck of it (and I think I can do it better (yes, I am full of myself ;) )). I already have a good portion of the program done except for the main part: building the dependency tree.

My program can currently:
* read Portage's configuration files and environment variables
* parse an ebuild (file that describes the way a package is built and installed) to get the dependency information
* parse the dependency string taking into account USE flags (determine which optional components get built for certain packages)
* take a dependency string such as 'x11-base/xfree', '>=net-fs/samba-2.3', etc. and return a list of candidate versions in the Portage tree (directory structure containing ~80,000 ebuilds taking into account masked packages
* determine the highest version of a particular package from a list of available package versions
* determine if a certain version of a certain package is already installed

I plan on writing a function that will get the dependency information for a certain package, check the dependencies for those dependencies, and so on down the line. I'd have a structure like:

mainpackage-1.0
|--dependency1-1.3
|  ---dep1_of_dep1
|--dependency2-2.3
|  ---dep1_of_dep2
|  ---dep2_of_dep2
|  ---dep3_of_dep2
|--dependency3

where each dependency is a string in the form of 'x11-base/xfree', '>=net-fs/samba-2.3', etc. What I can't figure out is how to take that structure and turn it into a list that looks something like:

dep1_of_dep1
dependency1-1.3
dep1_of_dep2
dep2_of_dep2
dep3_of_dep2
dependency2-2.3
dependency3
mainpackage-1.0

Generating the above would be pretty simple, but it isn't so simple to take into account packages that have dependencies on different version ranges than another package (one package having '>somepackage-3.4' and another '<somepackage-4.5') and all that other fun dependency stuff. Can anyone give me a few pointers?

Replies are listed 'Best First'.
Re: building a dependency tree
by toma (Vicar) on May 12, 2004 at 05:00 UTC
      Okay, so how can I use CPAN to find a module that will wash the dishes for me? ;)
Re: building a dependency tree
by kvale (Monsignor) on May 12, 2004 at 02:58 UTC
    It seems to me that the problem of building a list from your tree is different than that of resolving different or conflicting dependencies of variuos software packages. So it would be best to split these into different subroutines or modules: the first would meld differing dependencies in the trees and the second would take those melded trees and turn them into lists.

    Melding different dependencies is really a policy decision. If there is an inconsistency, do we throw out a package, or do we install both versions? Given a range of possible versions do we always pick the most recent? The most stable? And so on. If you are creating a perl clone of the Portage apps, the policy is already set for you: look at those apps to see what is done in case of dependency conflict.

    As you say, the second part is fairly simple: just do a preorder traversal of the tree.

    -Mark

      I'm still not quite sure how to build or structure the tree, though. I was thinking of using a hash with each package name as a key as a way to avoid extra calls to the function to extract dependencies and to avoid getting a package's dependency tree twice (2 packages have a dependency on the same package):

      my $deps = { 'mainpackage-1.0' => [ 'dependency1-1.3', 'dependency2-2.3', 'dependency3'], 'dependency1-1.3' => [ 'dep1_of_dep1' ], 'dependency2-2.3' => [ 'dep1_of_dep2', 'dep2_of_dep2', 'dep3_of_dep2' ], 'dependency3' => undef };

      This way, I have easy access to each previously referenced packages's primary dependencies without another function call. I still can't figure out how to put it all together into a full dependency tree, though. I'm still not sure I'm going about this the right way.
        You can use a structure just like the one you show. You don't really need a data structure that is internally hierarchical. The one you have is an adjacency (dependency) list, which can store a tree (well, any graph) just fine. In fact, it's a little nicer since there may be duplicated dependencies, as you mention (your dependencies form a directed acyclic graph, and not necessarily a tree). Assuming that finding the dependencies is trivial, generate the adjacency list like this:
        my @pkgs = qw[ mainpackage-1.0 ]; my %deps; while (@pkgs) { my $pkg = shift @pkgs; next if exists $deps{$pkg}; my @deps = find_dependencies($pkg); $deps{$pkg} = \@deps; push @pkgs, @deps; }
        You can traverse it like a tree like this:
        sub traverse { my ($deps, $root, $depth) = @_; print " " x $depth, "$root\n"; traverse($deps, $_, $depth + 1) for @{ $deps->{$root} }; } traverse( \%deps, "mainpackage-1.0" );
        .. which would generate output like this:
        mainpackage-1.0 dependency1-1.3 dep1_of_dep1 dependency2-2.3 dep1_of_dep2 dep2_of_dep2 dep3_of_dep3 dependency3
        The output would contain duplicates if the directed graph is not a tree. But you can use an additional hash within your traversal to mark already-seen packages, say, if you were doing a postorder traversal to determine what order to install all the needed packages.

        Anyway, with the adjacency list data structure, all the dependency information about a package is in one place. You can do the same with restrictions, i.e. "somepackage < 4.0" and "somepackage > 3.0". Store all these restrictions in another similar hash indexed by package name. When you're done calculating dependencies, look at each package's list of restrictions and see if they're reconcilable.

        blokhead

Re: building a dependency tree
by Anonymous Monk on May 12, 2004 at 08:16 UTC
    Hi, I'll make this short because I just waisted 15 minutes trying to format something that simply desapeared (page has expired error !!!! grrr).

    If you are going to act on the data structure, you might want to look at a "real" build sytem where you can hook in configuration management. I have a make replacement written in perl (including the syntax) it's functional and functioning (used in real life) but I'd like to work on some Configuration management support in the system. If you want to chat a bit about this subject, send me a mail to nadim@khemir.net (slides about the system can be found at the Nordic perl workshop)

    If you go for a hand made solution you can use Data::TreeDumper to dump the tree in a "good" looking way.

    Cheers, Nadim.