in reply to Re^4: Getting all subpaths from a path
in thread Getting all subpaths from a path
Thanks for the clarification. It's important to know because it tells us the restrictions you're working under, i.e. why the "just use rsync/tar/shar" suggestions weren't what you were looking for (though they could still be used...). I think that's led to some confusion in this thread so far. Anyway:
User gives me all the paths that he thinks are needed to run the tool inside the container (he gives a file that contains those paths and I read them into an array).
An important question here is: Would it be correct to assume you have access to the filesystem where these files are located? In other words, the Perl script, the input list of files, and the files themselves are all on the same machine? It's also still unclear to me if you want to mirror the files exactly as they are on the host machine, or if you want to manipulate the paths in any way?
Again, showing us with code is best, like choroba did in his Makefile. You also still haven't shown what format this list provided by the user looks like. Note that these things will also significantly benefit you in your development since they are at the same time test cases. In other words, the IMHO better way to ask the questions you asked is if you add them as test cases to the SSCCE that choroba provided.
I think your main concern seems to be this: if a user provides a path that includes a symlink, you want to make sure not to copy only that symlink, but also the target it points to, so that the symlink doesn't end up broken in the container. Another thing that is still unclear to me in this context is whether it is acceptable to you to rewrite any of the symlinks you encounter - for example, potential solutions could rewrite all relative symlinks to absolute ones, or symlink chains could be reduced by simply creating copies.
Anyway, what I've done here is construct an example that I think demonstrates what you're asking about.
mkdir -p /tmp/bar /tmp/foo touch /tmp/foo/one ln -fns /tmp/bar /tmp/foo/quz ln -fns ../foo/quz /tmp/bar/baz ln -fns ../foo/one /tmp/bar/two
In this example, the issue is that if the user were to specify only the path /tmp/bar/baz/two, then you need to figure out that all of /tmp/foo/{one,quz} and /tmp/bar/{two,baz} need to be reconstructed in order for the link to be valid. Here's my attempt at solving this; the tricky bit turned out to be figuring out the dependency chain for the symlinks. sub resolvesymlink is extracted from my script that I linked you to earlier (that includes tests so I'm fairly confident it's decent code, keeping in mind what I said). Note how this essentially does what I said above: round-trip the commands needed to recreate a directory structure.
Disclaimer: I've so far only tested it for the above test case plus a few variations. Use at your own risk. Though I do hope it's a starting point.
#!/usr/bin/env perl use warnings; use strict; use File::Basename 'fileparse'; use Cwd qw/getcwd abs_path/; use File::Spec::Functions qw/ splitdir catdir catfile file_name_is_absolute rel2abs rootdir /; use String::ShellQuote 'shell_quote'; use Graph; my @queue = @ARGV; # gather all dirs, files, and links my (%dirs,%files,%links); while ( my $targ = shift @queue ) { $targ = rel2abs($targ); die "does not exist: $targ" unless -e $targ; my @path = splitdir($targ); for my $i (1..$#path) { my $cur = catdir(@path[0..$i]); if ( -l $cur ) { defined( $links{$cur} = readlink($cur) ) or die "readlink $cur: $!"; # enqueue everything in the link chain # (excluding already seen symlinks) push @queue, grep { !$links{$_} } resolvesymlink($cur); } elsif ( -f $cur ) { $files{abs_path($cur)}++ } elsif ( -d $cur ) { $dirs{abs_path($cur)}++ } else { warn "skipping $cur, unknown type" } } } # simplify the dirs to shorten the mkdir command my $dg = Graph->new; for my $d (keys %dirs) { my @s = splitdir($d); $dg->add_edge(catdir(@s[0..$_]), catdir(@s[0..$_-1])) for 1..$#s; } # exterior vertices = leaves of the tree my @dirs = grep { $_ ne rootdir } sort $dg->exterior_vertices; print "mkdir -p ",shell_quote(@dirs),"\n" if @dirs; # output the files print "touch ",shell_quote(sort keys %files),"\n" if %files; # determine dependencies in symlinks via a topological sort my $lg = Graph->new; for my $l (keys %links) { my @res = resolvesymlink($l); die "unexpected resolvesymlink($l)" if @res<2; $lg->add_edge($l, $res[1]); # link depends on its target my @s = splitdir($l); for my $i (reverse 1..$#s-1) { my $d = catdir(@s[0..$i]); # if there's a link in the paths, this link depends on it too $lg->add_edge($l, $d) if defined $links{$d}; } } my @links = reverse grep { defined $links{$_} } $lg->topological_sort; print "ln -snf ",shell_quote($links{$_}, $_),"\n" for @links; # from https://bitbucket.org/haukex/htools/src/master/relink (a500e09) sub resolvesymlink { my $file = shift; die "not absolute: $file" unless file_name_is_absolute($file); my @files; my $origwd = getcwd; my $rv = eval { # in eval so orig working dir is always restored my $f = $file; while (1) { my $dir; ($f,$dir) = fileparse($f); last unless -d $dir; chdir $dir or die "chdir $dir: $!"; push @files, catfile(getcwd,$f); last unless -l $f; defined( $f = readlink $f ) or die "readlink $f (cwd=".getcwd."): $!"; } 1 }; my $err = $@||'unknown error'; chdir $origwd or die "chdir $origwd: $!"; die $err unless $rv; return @files ? @files : ($file); } __END__ mkdir -p /tmp/bar /tmp/foo touch /tmp/foo/one ln -snf /tmp/bar /tmp/foo/quz ln -snf ../foo/one /tmp/bar/two ln -snf ../foo/quz /tmp/bar/baz ln -snf ../foo/one /tmp/bar/baz/two
Update: Added the if @dirs and if %files to the two prints.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Getting all subpaths from a path
by ovedpo15 (Pilgrim) on Apr 02, 2021 at 20:27 UTC | |
by haukex (Archbishop) on Apr 03, 2021 at 08:13 UTC |