in reply to Getting all subpaths from a path

I'm confused about the path you're taking here (pun intended). Are you still working on Creating a bash script "on the fly" from a few days ago? If so, then I have to say I think you're creating yourself an XY Problem. I've already pointed you to:

Also, I'm not sure what you're testing with (index($link,"/") == -1)? Note links to directories don't need to end oncontain a slash; you won't know what the link is pointing to until you check the filesystem. (Update: Looking at your code again, I see a couple more ways in which certain symlinks would break it.)

But what if the link is relative (for example ../../abc)?

You're discovering one of the many intricacies of resolving symlinks. As explained in the documentation of File::Spec's canonpath: "If /foo on your system is a symlink to /bar/baz, then /foo/../quux is actually /bar/quux, not /quux as a naive ../-removal would give you."

Sorry, but based on my interpretation of everything above, I would strongly suggest that you first learn more about symlinks and handling filenames before trying to implement this. Even though I did a lot of research and testing when I wrote the relink script, and I trust my own code, I'm still not sure I've fully understood every little detail and edge case that could happen. It's not an easy topic.

But anyway, I suggest that even before that, maybe you could explain why the above solutions are not acceptable to you, and why you need to reinvent this particular wheel? I'm sure we could come up with a better solution for your actual problem.

If you're still working on Creating a bash script "on the fly", i.e. you're just trying to replicate a directory structure, then could you explain why a solution such as rsync, or simply copying over the links, is not sufficient? (You do know you can get most *NIX tools to not follow symlinks, right?)

Minor edits.

Replies are listed 'Best First'.
Re^2: Getting all subpaths from a path
by ovedpo15 (Pilgrim) on Mar 29, 2021 at 14:08 UTC
    Hi!
    You are probably right, it becomes an X-Y problem. I will try to do my best to explain the idea of what I'm trying to do and what lead me to opening the current question. As I explained in previous topic (https://www.perlmonks.org/?node_id=11130389), I'm trying to create a bash script "on the fly" output an array of paths that does the following three stages:
    1. Create the same directory hierarchy.
    2. Copy the files.
    3. Create the same links.
    For that I can do:
    1. I can use mkdir -p to create the full hierarchy based on the path.
    2. I can use scp/rsync for copying (as it's inside container).
    3. I can use ln -s to create the links.
    So I wanted to build a structure that will contain all the information (links, directories, files). I came up with the following structure:
    { "/": { "type": "dir", "files": [ { "usr": { "type": "dir", "files": [ { "vsa": { "type": "link-dir", "source": "/root/site/tools/gauv" } } ] } }, { "root": { "type": "dir", "files": [ { "site": { "type": "dir", "files": [ { "tools": { "type": "dir", "files": [ { "gauv": { "type": "dir", "files": [ { "pkgs": { "type": "dir", "files": [ { "python3": { "type": "dir", "files": [ { "3.6.3a": { "type": "dir-link", "source": "/usr/vsa/pkgs +/python3/3.6.3" } }, { "3.6.3": { "type": "dir", "links": [ { "lib": { "type": "dir", "links": [] } }, { "bin": { "type": "dir", "links": [] } } ] } } ] } } ] } } ] } } ] } } ] } } ] } } ] } }
    Which contains only the path /usr/vsa/pkgs/python3/3.6.3a/bin/python3.6 with it's links (as I described in the question). So I will parse each path and create this structure. Once I have this structure, I can extract all of the directories, files, and links (dir links and file links) into arrays and use them to build the bash script (write bash commands based on those paths into a file). That's the purpose of that whole idea.
    So my strategy was:
    1. Parse each path (by getting all subpaths and links) and insert into an array on all path.
    2. Check the type of each path in the array (link, file, directory) and insert into the structure.
    3. Extract arrays of dir paths, array of dir links, array of files, array of file links.
    4. Iterate over each array and create the bashe script.

    I'm having trouble with steps 1-2. In the current node I ask about step 1. I wanted to parse each path and split it into sub paths. Then check each subpath if it's a link and if so, I will insert the target of the link into the array, change all of the other subpaths (for example if I have (/a,/a/b/,/a/b/d) and /a/b->/e/f then it should be (/a,/a/b/,/e/f,/e/f/d)). I should also handle two special cases here:
    1. Relative links - not sure how to handle with it currently. I have tried to handle with only local relative links like /a/b/c -> /a/b/d but it's getting complicated.
    2. Recursive links - for example /a/b -> /c/d -> e/f ...). That's why I used while(1).


    To sum up, those are the big questions:
    1. What would be the best design strategy to implement here? Was my idea good?
    2. If so, Is my suggested structure good enough? How would you change it?


    Now, for what you suggested:
    1. The idea is to create a bash script that copies the environment into a container. rsync can help me here but I it can come in handy in step 4 (while copying files, instead of scp). I can't use rsync on the whole directory because it will then copy files that are not in the array of paths. Assume you have in the input array of path (/a/b/1.file, /a/b/2.file) and you also have 3.file inside /a/b. I don't want to copy it, only 1.file and 2.file so rsync on the whole directory won't work here. It can be used to copy files (which is the same as scp).
    2. I have tried now splitdir and you are right, it's better than splitting by "/". Is there a subroutine that can give me all the subpaths of a path?
    3. Yes it's a good subroutine but I can't use it yet because I need to parse each path - find out if it's a link. abs_path will give me the final path but I also want to have the recursive links (like I mentioned before /a/b -> /c/d -> e/f, in that case abs_path will just get /e/f and ignore /c/d).
    I hope this post will clarify some opened question. If not, I will be more than glad to answer more. I'm sorry if I didn't explain the question good enough before. Thanks for the help until now!


    Also, some more code that I wrote, while trying to make it work (just for reference. also really sorry it's messy and with bad variable names):
    foreach my $f (@arr) { if (-l $f) { print($f, " is a link to ",readlink($f) , "\n"); my @a = split("/",$f); my $result; my $counter = 0; my $last_files_block = $st{"/"}{"files"}; while (1) { unless ($counter < scalar(@a)) { last; } my $x = $a[$counter]; if ($x eq '') { $counter += 1; next; } if ($counter + 1 == scalar(@a)) { if (-f $f) { my $found = 0; foreach my $v (@{$last_files_block}) { if (defined($v->{$x})) { $found = 1; last; } } if ($found == 0) { my %vsaaa = ("type" => "link-file", "source" => re +adlink($f)); my %st1 = ($x => \%vsaaa ); push(@{$last_files_block}, \%st1); } my $last = $f; while (1) { my $c = readlink($last); if (-l $c) { $last = $c; if (index($c,"/") != -1) { push(@arr,$c); } else { my $found1 = 0; foreach my $v (@{$last_files_block}) { if (defined($v->{$x})) { $found = 1; last; } } if ($found1 == 0) { my %vsaaa = ("type" => "file"); my %st1 = ($x => \%vsaaa ); push(@{$last_files_block}, \%st1); } } } else { if (index($c,"/") != -1) { push(@arr,$c); } else { my $found1 = 0; foreach my $v (@{$last_files_block}) { if (defined($v->{$x})) { $found = 1; last; } } if ($found1 == 0) { my %vsaaa = ("type" => "file"); my %st1 = ($x => \%vsaaa ); push(@{$last_files_block}, \%st1); } } last; } } } if (-d $f) { my $found = 0; foreach my $v (@{$last_files_block}) { if (defined($v->{$x})) { $found = 1; last; } } if ($found == 0) { my $n = readlink($f); if (index($n,"/") == -1) { my $dirname = dirname($f); $n = "$dirname/$n"; #TODO: what if relativ +e? } my %vsaaa = ("type" => "dir-link", "source" => + $n); my %st1 = ($x => \%vsaaa ); push(@{$last_files_block}, \%st1); } } last; } my $found = 0; foreach my $v (@{$last_files_block}) { if (defined($v->{$x})) { $last_files_block = $v->{$x}{"files"}; $counter += 1; $found = 1; last; } } if ($found == 0) { my %vsaaa = ("type" => "dir", "files" => [] ); my %st1 = ($x => \%vsaaa ); push(@{$last_files_block}, \%st1); $last_files_block = $vsaaa{"files"}; $counter += 1; } } } elsif (-f $f) { print($f, " is a file\n"); my @a = split("/",$f); my $result; my $counter = 0; my $last_files_block = $st{"/"}{"files"}; while (1) { unless ($counter < scalar(@a)) { last; } my $x = $a[$counter]; if ($x eq '') { $counter += 1; next; } if ($counter + 1 == scalar(@a)) { my $found = 0; foreach my $v (@{$last_files_block}) { if (defined($v->{$x})) { $found = 1; last; } } if ($found == 0) { my %vsaaa = ("type" => "file"); my %st1 = ($x => \%vsaaa ); push(@{$last_files_block}, \%st1); } last; } my $found = 0; foreach my $v (@{$last_files_block}) { if (defined($v->{$x})) { $last_files_block = $v->{$x}{"files"}; $counter += 1; $found = 1; last; } } if ($found == 0) { my %vsaaa = ("type" => "dir", "files" => [] ); my %st1 = ($x => \%vsaaa ); push(@{$last_files_block}, \%st1); $last_files_block = $vsaaa{"files"}; $counter += 1; } } } elsif (-d $f) { print($f, " is a dir\n"); my @a = split("/",$f); my $result; my $counter = 0; my $last_files_block = $st{"/"}{"files"}; while (1) { unless ($counter < scalar(@a)) { last; } my $x = $a[$counter]; if ($x eq '') { $counter += 1; next; } my $found = 0; my $found_link = 0; foreach my $v (@{$last_files_block}) { if (defined($v->{$x})) { if ($v->{$x}{"type"} eq "dir-link" || $v->{$x}{"type"} + eq "link-file") { $found_link = 1; last; } $last_files_block = $v->{$x}{"files"}; $counter += 1; $found = 1; last; } } if ($found_link == 1) { last; } if ($found == 0) { my %vsaaa = ("type" => "dir", "files" => [] ); my %st1 = ($x => \%vsaaa ); push(@{$last_files_block}, \%st1); $last_files_block = $vsaaa{"files"}; $counter += 1; } } } else { #TODO: When can it happen other than path does not exist o +r permission denied? print($f, " is a special\n"); } }

      I think the significant bit of information that was missing previously (the "X" in the XY Problem) is what you mentioned here: "I'm trying to create a Singularity recipes builder." By this I'm guessing you mean Singularity, and their "Recipes" to build containers, more specifically, something you can execute in their Singularity file %post section (which gets executed with /bin/sh) to build the container?

      By "recipes builder", do you mean you want to write a Perl script that will generate commands that can be executed by /bin/sh to reproduce a certain environment (directory structure, links, etc.)? In other words, you want to write a Perl script that will generate a sequence of mkdir -p commands, followed by cp commands, followed by ln -s commands, such that when Singularity builds the container and executes the script containing these commands, those dirs/links/files will be present in the generated squashfs image?

      (By the way, why not use the built-in %files section?)

      Note that I had to deduce all this means you need to describe your task better :-) Remember to explain the "X" you're trying to accomplish, plus sample input, expected output for that input - something like a high-level SSCCE.

      You haven't shown your input, which I am guessing is the filesystem that you want to mirror into the container? One way you could provide an SSCCE for us is to give us a list of commands to recreate the directory structure.

      You also haven't shown your expected output, i.e. the /bin/sh script you want to produce.

      Interesting: Note that both input and output are basically the same thing!

      So if I'm correct with all my guesses so far, the problem can be more or less reduced to: a Perl script that will basically round-trip a /bin/sh script containing mkdir, cp, and ln commands.

      However, since that's a lot of guessing, I'm going to stop here for now - please let us know if the above is correct or not, and if not, what it is you're actually trying to do. (Also, looking over choroba's sample code, it looks like a good starting point.)

        Yes, I'm trying to create those recipes on the fly. User gives me all the paths that he thinks are needed to run the tool inside the container (he gives a file that contains those paths and I read them into an array). With those paths I can build the recipe. In the %setup section I will create the directories, in the %files section I will copy files and in the %post section I will create the links. So I don't really want to create a shell script, I do want to build the recipe with Perl. But I didn't want to talk about Singularity because I guessed most of the people here are not familiar with it. So I tried to simplify it to creating a shell script (aka the recipe) that creates those directories, copies files and creates links.
        So if we moved to talk about recipes, the purpose of the Perl script is to build the recipe, based on all the paths that users thinks are needed for running his tool in the container. So the input is really the paths, as I explained, and the output is the recipe (aka the shell script).
        So my question is still remains. Given the paths, I want to build some structure that I could easily extract all the files/links/directories and use them for creating the recipe file. If you think there is a better way of creating it, I'm all ears.
        choroba's answer is a good start but I had some questions that I commented under it.
      Can someone suggest on strategy on how to solve it? I tried some other similar things but it got too complicated and failed.
        You cannot create the links before both the source and target directories exist. So, postpone their creation to the end. The directories are processed from the shortest to the longest, so we always know the parent path exists. You should also add a check that a link doesn't point outside of the given directory tree.

        I created the following Makefile to experiment with your data:

        And this was the script 1.pl:

        #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use Path::Tiny qw{ path }; my $path = path(shift); my $paths = $path->visit( sub { $_[1]->{$_} = (-f) ? 'file' : (-l) ? [readlink] : (-d) ? 'dir' : 'unknown' }, { recurse => 1 } ); my @links; for my $found (sort { length $a <=> length $b } keys %$paths) { if ('file' eq $paths->{$found}) { say qq(cp '$found' "\$target/$found"); } elsif ('dir' eq $paths->{$found}) { say qq(mkdir "\$target/$found"); } elsif (ref [] eq ref $paths->{$found}) { my $to = path($paths->{$found}[0]); $to = $to->relative(path($found)->absolute->parent) if $to->is_absolute; my $source = path('$target', $found); push @links, qq(ln -s "$to" "$source"); } } say for @links;

        It's just a toy. Use some of the ShellQuote modules to fix the filenames; but for the example given, it works.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

        It seems like you are trying to re-invent 'tar'.

        Sorry, I've been quite busy, but I hope to get a chance to read your post tomorrow.