Yes, I'm trying to create those recipes on the fly. User gives me all the paths that he thinks are needed to run the tool inside the container (he gives a file that contains those paths and I read them into an array). With those paths I can build the recipe. In the %setup section I will create the directories, in the %files section I will copy files and in the %post section I will create the links. So I don't really want to create a shell script, I do want to build the recipe with Perl. But I didn't want to talk about Singularity because I guessed most of the people here are not familiar with it. So I tried to simplify it to creating a shell script (aka the recipe) that creates those directories, copies files and creates links.
So if we moved to talk about recipes, the purpose of the Perl script is to build the recipe, based on all the paths that users thinks are needed for running his tool in the container. So the input is really the paths, as I explained, and the output is the recipe (aka the shell script).
So my question is still remains. Given the paths, I want to build some structure that I could easily extract all the files/links/directories and use them for creating the recipe file. If you think there is a better way of creating it, I'm all ears.
choroba's answer is a good start but I had some questions that I commented under it.
| [reply] |
Thanks for the clarification. It's important to know because it tells us the restrictions you're working under, i.e. why the "just use rsync/tar/shar" suggestions weren't what you were looking for (though they could still be used...). I think that's led to some confusion in this thread so far. Anyway:
User gives me all the paths that he thinks are needed to run the tool inside the container (he gives a file that contains those paths and I read them into an array).
An important question here is: Would it be correct to assume you have access to the filesystem where these files are located? In other words, the Perl script, the input list of files, and the files themselves are all on the same machine? It's also still unclear to me if you want to mirror the files exactly as they are on the host machine, or if you want to manipulate the paths in any way?
Again, showing us with code is best, like choroba did in his Makefile. You also still haven't shown what format this list provided by the user looks like. Note that these things will also significantly benefit you in your development since they are at the same time test cases. In other words, the IMHO better way to ask the questions you asked is if you add them as test cases to the SSCCE that choroba provided.
I think your main concern seems to be this: if a user provides a path that includes a symlink, you want to make sure not to copy only that symlink, but also the target it points to, so that the symlink doesn't end up broken in the container. Another thing that is still unclear to me in this context is whether it is acceptable to you to rewrite any of the symlinks you encounter - for example, potential solutions could rewrite all relative symlinks to absolute ones, or symlink chains could be reduced by simply creating copies.
Anyway, what I've done here is construct an example that I think demonstrates what you're asking about.
mkdir -p /tmp/bar /tmp/foo
touch /tmp/foo/one
ln -fns /tmp/bar /tmp/foo/quz
ln -fns ../foo/quz /tmp/bar/baz
ln -fns ../foo/one /tmp/bar/two
In this example, the issue is that if the user were to specify only the path /tmp/bar/baz/two, then you need to figure out that all of /tmp/foo/{one,quz} and /tmp/bar/{two,baz} need to be reconstructed in order for the link to be valid. Here's my attempt at solving this; the tricky bit turned out to be figuring out the dependency chain for the symlinks. sub resolvesymlink is extracted from my script that I linked you to earlier (that includes tests so I'm fairly confident it's decent code, keeping in mind what I said). Note how this essentially does what I said above: round-trip the commands needed to recreate a directory structure.
Disclaimer: I've so far only tested it for the above test case plus a few variations. Use at your own risk. Though I do hope it's a starting point.
#!/usr/bin/env perl
use warnings;
use strict;
use File::Basename 'fileparse';
use Cwd qw/getcwd abs_path/;
use File::Spec::Functions qw/ splitdir catdir catfile
file_name_is_absolute rel2abs rootdir /;
use String::ShellQuote 'shell_quote';
use Graph;
my @queue = @ARGV;
# gather all dirs, files, and links
my (%dirs,%files,%links);
while ( my $targ = shift @queue ) {
$targ = rel2abs($targ);
die "does not exist: $targ" unless -e $targ;
my @path = splitdir($targ);
for my $i (1..$#path) {
my $cur = catdir(@path[0..$i]);
if ( -l $cur ) {
defined( $links{$cur} = readlink($cur) )
or die "readlink $cur: $!";
# enqueue everything in the link chain
# (excluding already seen symlinks)
push @queue, grep { !$links{$_} } resolvesymlink($cur);
}
elsif ( -f $cur ) { $files{abs_path($cur)}++ }
elsif ( -d $cur ) { $dirs{abs_path($cur)}++ }
else { warn "skipping $cur, unknown type" }
}
}
# simplify the dirs to shorten the mkdir command
my $dg = Graph->new;
for my $d (keys %dirs) {
my @s = splitdir($d);
$dg->add_edge(catdir(@s[0..$_]), catdir(@s[0..$_-1])) for 1..$#s;
}
# exterior vertices = leaves of the tree
my @dirs = grep { $_ ne rootdir } sort $dg->exterior_vertices;
print "mkdir -p ",shell_quote(@dirs),"\n" if @dirs;
# output the files
print "touch ",shell_quote(sort keys %files),"\n" if %files;
# determine dependencies in symlinks via a topological sort
my $lg = Graph->new;
for my $l (keys %links) {
my @res = resolvesymlink($l);
die "unexpected resolvesymlink($l)" if @res<2;
$lg->add_edge($l, $res[1]); # link depends on its target
my @s = splitdir($l);
for my $i (reverse 1..$#s-1) {
my $d = catdir(@s[0..$i]);
# if there's a link in the paths, this link depends on it too
$lg->add_edge($l, $d) if defined $links{$d};
}
}
my @links = reverse grep { defined $links{$_} } $lg->topological_sort;
print "ln -snf ",shell_quote($links{$_}, $_),"\n" for @links;
# from https://bitbucket.org/haukex/htools/src/master/relink (a500e09)
sub resolvesymlink {
my $file = shift;
die "not absolute: $file" unless file_name_is_absolute($file);
my @files;
my $origwd = getcwd;
my $rv = eval { # in eval so orig working dir is always restored
my $f = $file;
while (1) {
my $dir;
($f,$dir) = fileparse($f);
last unless -d $dir;
chdir $dir or die "chdir $dir: $!";
push @files, catfile(getcwd,$f);
last unless -l $f;
defined( $f = readlink $f )
or die "readlink $f (cwd=".getcwd."): $!";
}
1 }; my $err = $@||'unknown error';
chdir $origwd or die "chdir $origwd: $!";
die $err unless $rv;
return @files ? @files : ($file);
}
__END__
mkdir -p /tmp/bar /tmp/foo
touch /tmp/foo/one
ln -snf /tmp/bar /tmp/foo/quz
ln -snf ../foo/one /tmp/bar/two
ln -snf ../foo/quz /tmp/bar/baz
ln -snf ../foo/one /tmp/bar/baz/two
Update: Added the if @dirs and if %files to the two prints. | [reply] [d/l] [select] |
Would it be correct to assume you have access to the filesystem where these files are located? In other words, the Perl script, the input list of files, and the files themselves are all on the same machine?
Yes.
It's also still unclear to me if you want to mirror the files exactly as they are on the host machine, or if you want to manipulate the paths in any way?
Exactly the same.
Anyway, your code is exactly what I need, especially the first part ("gather all dirs, files, and links"). It's a good start. I have only one question for now. For that I will use the example I presented in the first post:
I have a path: /usr/vsa/pkgs/python3/3.6.3a/bin/python3.6. There are two links in the path:
/usr/vsa -> /root/site/tools/gauv
/usr/vsa/pkgs/python3/3.6.3a -> 3.6.3
With your code I get three links:
ln -snf /root/site/tools/gauv /usr/vsa
ln -snf 3.6.3 /usr/vsa/pkgs/python3/3.6.3a
ln -snf 3.6.3 /root/site/tools/gauv/pkgs/python3/3.6.3a
For the second link I get an error:
ln: failed to create symbolic link '/usr/vsa/pkgs/python3/3.6.3a': No
+such file or directory which is correct
Which makes sense since '/usr/vsa/pkgs/python3/3.6.3a' is not a realpath (even though I can access this path since it has a link on the way). Uncommenting this line works to build the recipe since this link is unnecessary because the third one is the correct one (it does what the second one wants to do). How can I "ignore"/"remove" those kind of links from %links? | [reply] [d/l] [select] |