in reply to Re^2: Getting all subpaths from a path
in thread Getting all subpaths from a path

I think the significant bit of information that was missing previously (the "X" in the XY Problem) is what you mentioned here: "I'm trying to create a Singularity recipes builder." By this I'm guessing you mean Singularity, and their "Recipes" to build containers, more specifically, something you can execute in their Singularity file %post section (which gets executed with /bin/sh) to build the container?

By "recipes builder", do you mean you want to write a Perl script that will generate commands that can be executed by /bin/sh to reproduce a certain environment (directory structure, links, etc.)? In other words, you want to write a Perl script that will generate a sequence of mkdir -p commands, followed by cp commands, followed by ln -s commands, such that when Singularity builds the container and executes the script containing these commands, those dirs/links/files will be present in the generated squashfs image?

(By the way, why not use the built-in %files section?)

Note that I had to deduce all this means you need to describe your task better :-) Remember to explain the "X" you're trying to accomplish, plus sample input, expected output for that input - something like a high-level SSCCE.

You haven't shown your input, which I am guessing is the filesystem that you want to mirror into the container? One way you could provide an SSCCE for us is to give us a list of commands to recreate the directory structure.

You also haven't shown your expected output, i.e. the /bin/sh script you want to produce.

Interesting: Note that both input and output are basically the same thing!

So if I'm correct with all my guesses so far, the problem can be more or less reduced to: a Perl script that will basically round-trip a /bin/sh script containing mkdir, cp, and ln commands.

However, since that's a lot of guessing, I'm going to stop here for now - please let us know if the above is correct or not, and if not, what it is you're actually trying to do. (Also, looking over choroba's sample code, it looks like a good starting point.)

Replies are listed 'Best First'.
Re^4: Getting all subpaths from a path
by ovedpo15 (Pilgrim) on Apr 02, 2021 at 15:21 UTC
    Yes, I'm trying to create those recipes on the fly. User gives me all the paths that he thinks are needed to run the tool inside the container (he gives a file that contains those paths and I read them into an array). With those paths I can build the recipe. In the %setup section I will create the directories, in the %files section I will copy files and in the %post section I will create the links. So I don't really want to create a shell script, I do want to build the recipe with Perl. But I didn't want to talk about Singularity because I guessed most of the people here are not familiar with it. So I tried to simplify it to creating a shell script (aka the recipe) that creates those directories, copies files and creates links.
    So if we moved to talk about recipes, the purpose of the Perl script is to build the recipe, based on all the paths that users thinks are needed for running his tool in the container. So the input is really the paths, as I explained, and the output is the recipe (aka the shell script).
    So my question is still remains. Given the paths, I want to build some structure that I could easily extract all the files/links/directories and use them for creating the recipe file. If you think there is a better way of creating it, I'm all ears.
    choroba's answer is a good start but I had some questions that I commented under it.

      Thanks for the clarification. It's important to know because it tells us the restrictions you're working under, i.e. why the "just use rsync/tar/shar" suggestions weren't what you were looking for (though they could still be used...). I think that's led to some confusion in this thread so far. Anyway:

      User gives me all the paths that he thinks are needed to run the tool inside the container (he gives a file that contains those paths and I read them into an array).

      An important question here is: Would it be correct to assume you have access to the filesystem where these files are located? In other words, the Perl script, the input list of files, and the files themselves are all on the same machine? It's also still unclear to me if you want to mirror the files exactly as they are on the host machine, or if you want to manipulate the paths in any way?

      Again, showing us with code is best, like choroba did in his Makefile. You also still haven't shown what format this list provided by the user looks like. Note that these things will also significantly benefit you in your development since they are at the same time test cases. In other words, the IMHO better way to ask the questions you asked is if you add them as test cases to the SSCCE that choroba provided.

      I think your main concern seems to be this: if a user provides a path that includes a symlink, you want to make sure not to copy only that symlink, but also the target it points to, so that the symlink doesn't end up broken in the container. Another thing that is still unclear to me in this context is whether it is acceptable to you to rewrite any of the symlinks you encounter - for example, potential solutions could rewrite all relative symlinks to absolute ones, or symlink chains could be reduced by simply creating copies.

      Anyway, what I've done here is construct an example that I think demonstrates what you're asking about.

      mkdir -p /tmp/bar /tmp/foo touch /tmp/foo/one ln -fns /tmp/bar /tmp/foo/quz ln -fns ../foo/quz /tmp/bar/baz ln -fns ../foo/one /tmp/bar/two

      In this example, the issue is that if the user were to specify only the path /tmp/bar/baz/two, then you need to figure out that all of /tmp/foo/{one,quz} and /tmp/bar/{two,baz} need to be reconstructed in order for the link to be valid. Here's my attempt at solving this; the tricky bit turned out to be figuring out the dependency chain for the symlinks. sub resolvesymlink is extracted from my script that I linked you to earlier (that includes tests so I'm fairly confident it's decent code, keeping in mind what I said). Note how this essentially does what I said above: round-trip the commands needed to recreate a directory structure.

      Disclaimer: I've so far only tested it for the above test case plus a few variations. Use at your own risk. Though I do hope it's a starting point.

      #!/usr/bin/env perl use warnings; use strict; use File::Basename 'fileparse'; use Cwd qw/getcwd abs_path/; use File::Spec::Functions qw/ splitdir catdir catfile file_name_is_absolute rel2abs rootdir /; use String::ShellQuote 'shell_quote'; use Graph; my @queue = @ARGV; # gather all dirs, files, and links my (%dirs,%files,%links); while ( my $targ = shift @queue ) { $targ = rel2abs($targ); die "does not exist: $targ" unless -e $targ; my @path = splitdir($targ); for my $i (1..$#path) { my $cur = catdir(@path[0..$i]); if ( -l $cur ) { defined( $links{$cur} = readlink($cur) ) or die "readlink $cur: $!"; # enqueue everything in the link chain # (excluding already seen symlinks) push @queue, grep { !$links{$_} } resolvesymlink($cur); } elsif ( -f $cur ) { $files{abs_path($cur)}++ } elsif ( -d $cur ) { $dirs{abs_path($cur)}++ } else { warn "skipping $cur, unknown type" } } } # simplify the dirs to shorten the mkdir command my $dg = Graph->new; for my $d (keys %dirs) { my @s = splitdir($d); $dg->add_edge(catdir(@s[0..$_]), catdir(@s[0..$_-1])) for 1..$#s; } # exterior vertices = leaves of the tree my @dirs = grep { $_ ne rootdir } sort $dg->exterior_vertices; print "mkdir -p ",shell_quote(@dirs),"\n" if @dirs; # output the files print "touch ",shell_quote(sort keys %files),"\n" if %files; # determine dependencies in symlinks via a topological sort my $lg = Graph->new; for my $l (keys %links) { my @res = resolvesymlink($l); die "unexpected resolvesymlink($l)" if @res<2; $lg->add_edge($l, $res[1]); # link depends on its target my @s = splitdir($l); for my $i (reverse 1..$#s-1) { my $d = catdir(@s[0..$i]); # if there's a link in the paths, this link depends on it too $lg->add_edge($l, $d) if defined $links{$d}; } } my @links = reverse grep { defined $links{$_} } $lg->topological_sort; print "ln -snf ",shell_quote($links{$_}, $_),"\n" for @links; # from https://bitbucket.org/haukex/htools/src/master/relink (a500e09) sub resolvesymlink { my $file = shift; die "not absolute: $file" unless file_name_is_absolute($file); my @files; my $origwd = getcwd; my $rv = eval { # in eval so orig working dir is always restored my $f = $file; while (1) { my $dir; ($f,$dir) = fileparse($f); last unless -d $dir; chdir $dir or die "chdir $dir: $!"; push @files, catfile(getcwd,$f); last unless -l $f; defined( $f = readlink $f ) or die "readlink $f (cwd=".getcwd."): $!"; } 1 }; my $err = $@||'unknown error'; chdir $origwd or die "chdir $origwd: $!"; die $err unless $rv; return @files ? @files : ($file); } __END__ mkdir -p /tmp/bar /tmp/foo touch /tmp/foo/one ln -snf /tmp/bar /tmp/foo/quz ln -snf ../foo/one /tmp/bar/two ln -snf ../foo/quz /tmp/bar/baz ln -snf ../foo/one /tmp/bar/baz/two

      Update: Added the if @dirs and if %files to the two prints.

        Would it be correct to assume you have access to the filesystem where these files are located? In other words, the Perl script, the input list of files, and the files themselves are all on the same machine?

        Yes.

        It's also still unclear to me if you want to mirror the files exactly as they are on the host machine, or if you want to manipulate the paths in any way?

        Exactly the same.

        Anyway, your code is exactly what I need, especially the first part ("gather all dirs, files, and links"). It's a good start. I have only one question for now. For that I will use the example I presented in the first post:
        I have a path: /usr/vsa/pkgs/python3/3.6.3a/bin/python3.6. There are two links in the path:
        /usr/vsa -> /root/site/tools/gauv /usr/vsa/pkgs/python3/3.6.3a -> 3.6.3
        With your code I get three links:
        ln -snf /root/site/tools/gauv /usr/vsa ln -snf 3.6.3 /usr/vsa/pkgs/python3/3.6.3a ln -snf 3.6.3 /root/site/tools/gauv/pkgs/python3/3.6.3a
        For the second link I get an error:
        ln: failed to create symbolic link '/usr/vsa/pkgs/python3/3.6.3a': No +such file or directory which is correct
        Which makes sense since '/usr/vsa/pkgs/python3/3.6.3a' is not a realpath (even though I can access this path since it has a link on the way). Uncommenting this line works to build the recipe since this link is unnecessary because the third one is the correct one (it does what the second one wants to do). How can I "ignore"/"remove" those kind of links from %links?