ovedpo15 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!
I'm trying to get the canonical path of a given path (without '.', '~', '..', ... - just the unique path, string with '/'). I came across with this question and solution. The suggested code:
foreach my $path ("/a/b/c/d/../../../e" , "/a/../b/./c//d") { my @c= reverse split m@/@, $path; my @c_new; while (@c) { my $component= shift @c; next unless length($component); if ($component eq ".") { next; } if ($component eq "..") { my $i=0; while ($c[$i] =~ m/^\.{0,2}$/) { $i++ } splice(@c, $i, 1); next } push @c_new, $component; } print "/".join("/", reverse @c_new) ."\n"; }
It works good for most cases but I had trouble with paths that start, for example, with ".." (like "../script.sh") then it gives me: Use of uninitialized value within @c in pattern match (m//).
How can it possible to fix this code to support those cases?
Please note that I don't want to get the absolute path of the given path - meaning, I don't want to resolve any links on the way. Just to get the canonical path of the given path - starting from root, without '..', '.', '~' symbols. For example, given "../softlink/script.sh" where softlink is a link to some other area "softlink -> /a/b/c" then I want to get: "/root/to/that/path/softlink/script.sh" (I didn't resolve the link)

Replies are listed 'Best First'.
Re: How to get the unique canonical path of a given path?
by hippo (Archbishop) on Jul 13, 2022 at 15:39 UTC
      Hi, thanks for the suggestion but not quite. I don't want to resolve any links in the path. I thought of adding:
      unless ($path =~ /^\//) { $path = catdir(getcwd,$path); }
      What do you think?

        I don’t think you’re going to find much off the shelf because the posix/*nix way of "canonical" path is to expand out and replace symlinks (see Pathname Resolution). I think closest you might get would be rolling something using File::Spec and splitdir to unroll (maybe with glob for tilde handling beforehand), process for dot(s), then concatenate the remaining elements back.

        The cake is a lie.
        The cake is a lie.
        The cake is a lie.

        I don't want to resolve any links in the path.

        Well then, I don't understand what you mean because AFAICS realpath does not do that*. If you can provide an SSCCE (ideally in the form of a test) showing precisely what output you want for a given input and how realpath fails in that regard perhaps we can guide you further.

        * Update: yes, it does do that - see replies.


        🦛

Re: How to get the unique canonical path of a given path?
by Tux (Canon) on Jul 15, 2022 at 13:43 UTC
    • Resolving /../ may lead to illegal or unwanted locations. The part before the /../ might be a symbolic link pointing to somewhere completely outside of the path you are changing
    • Your solution, as mentioned by others, is not safe for leading ./, ../ or a range of ../../.. that would cause to go beyond the root

    With you code safe-guarded against the above remarks, here is a compare of the methods mentioned in this thread plus the one I would use: Cwd::abs_path. Note that abs_path returns undef for non-existing path.

    #!/usr/bin/perl use 5.018003; use warnings; use Cwd qw( abs_path ); use Path::Tiny; use File::Spec; my @pth = qw( /a/b/c/d/../../../e /a/../b/./c//d ../scripting ./tmp /tmp/../../../tmp ); sub resolves { my ($p, $r) = @_; printf "%-20s -> %s\n", $p, $r // "$p does not resolve"; } # resolves say "OP"; foreach my $pth (@pth) { my @c = reverse split m{/+}, $pth; # /+ removes empty elements my @c_new; while (@c) { my $component = shift @c; next unless length ($component); $component eq "." and @c and next; if ($component eq ".." and @c) { my $i = 0; while ($i <= $#c && $c[$i] =~ m/^\.{0,2}$/) { $i++; } splice @c, $i, 1; next; } push @c_new => $component; } @c = reverse @c_new; $c[0] =~ m/^\.\.?$/ or unshift @c => ""; resolves $pth, join "/" => @c; } say "Cwd::abs_path"; foreach my $pth (@pth) { resolves $pth, abs_path ($pth); + } say "Path::Tiny::path"; foreach my $pth (@pth) { resolves $pth, Path::Tiny::path ($pth); + } say "File::Spec::canonpath"; foreach my $pth (@pth) { resolves $pth, File::Spec->canonpath ($pth); + }

    which produces

    OP /a/b/c/d/../../../e -> /a/e /a/../b/./c//d -> /b/c/d ../scripting -> ../scripting ./tmp -> ./tmp /tmp/../../../tmp -> /tmp Cwd::abs_path /a/b/c/d/../../../e -> /a/b/c/d/../../../e does not resolve /a/../b/./c//d -> /a/../b/./c//d does not resolve ../scripting -> /home/scripting ./tmp -> /home/merijn/tmp /tmp/../../../tmp -> /tmp Path::Tiny::path /a/b/c/d/../../../e -> /a/b/c/d/../../../e /a/../b/./c//d -> /a/../b/c/d ../scripting -> ../scripting ./tmp -> tmp /tmp/../../../tmp -> /tmp/../../../tmp File::Spec::canonpath /a/b/c/d/../../../e -> /a/b/c/d/../../../e /a/../b/./c//d -> /a/../b/c/d ../scripting -> ../scripting ./tmp -> tmp /tmp/../../../tmp -> /tmp/../../../tmp

    Enjoy, Have FUN! H.Merijn
Re: How to get the unique canonical path of a given path?
by rizzo (Curate) on Jul 13, 2022 at 23:05 UTC
Re: How to get the unique canonical path of a given path?
by ikegami (Patriarch) on Jul 14, 2022 at 15:00 UTC

    It works good for most cases but I had trouble with paths that start, for example, with ".." (like "../script.sh")

    What output do you want for ../script.sh?

Re: How to get the unique canonical path of a given path?
by tybalt89 (Monsignor) on Jul 16, 2022 at 14:28 UTC

    Are these answers correct?

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11145493 use warnings; for my $path ( qw( /a/b/c/d/../../../e /a/../b/./c//d ../invalid_do_not_change ./tmp /tmp/../../../tmp A//B A/B/ A/./B A/foo/../B ) ) { local $_ = $path; 1 while s{ /+(?=/) | # multiple /// ^/..(?=/) | # stay at root /\z | # remove trailing / (?<=/)(?!\.\./)[^/]+/\.\./ | # remove 'name/../' (?<![^/])\./ # remove ./ }{}x; printf "%30s -> %s\n", $path, $_; }

    Outputs:

    /a/b/c/d/../../../e -> /a/e /a/../b/./c//d -> /b/c/d ../invalid_do_not_change -> ../invalid_do_not_change ./tmp -> tmp /tmp/../../../tmp -> /tmp A//B -> A/B A/B/ -> A/B A/./B -> A/B A/foo/../B -> A/B

      Nevermind; ignore my objection from the spoiler. I just tried the cd /tmp/../../../tmp and found it did go to the /tmp directory. And perl -le 'use autodie; open my $fh, ">", "/tmp/../../../tmp/worked.txt"; print {$fh} "it worked";' works as expected as well, so apparently that weird notation is a perfectly-valid syntax. Sorry.

        It works because on a *nix system the root directory is pretty much defined by " .. is the same as . "

      [Nothing to see here]

Re: How to get the unique canonical path of a given path?
by bliako (Abbot) on Jul 17, 2022 at 18:04 UTC

    You got a lot of good answers. I will copy-paste an idea from shell scripting (reminded to me by pryrt's answer): change-dir to the location and then ask for cwd. It implies that you can find extract the dirname of said path (edit: just for cd'ing to it, so you don't need to resolve it, the system will (try)) and also, (edit: most importantly) have the permissions to change-dir to that. Edit: Also, these paths must be real so to changedir to them. Edit: so perhaps not very practical in some use-cases. I use this to find the containing dir of a shell script in bash. But as I said, caveats exist. Oh! and it will be super slow compared to any programmatic way.

    1 min edit: If you chdir to a symlink dir, some systems' cwd will report the symlinked dir instead of resolving it, so ...

    bw, bliako

Re: How to get the unique canonical path of a given path?
by perlfan (Parson) on Jul 17, 2022 at 14:43 UTC
    readlink -f is the bash command to resolve this. This may help you discover the answer for Perl. Perl has readlink, but YMMV.
Re: How to get the unique canonical path of a given path?
by Anonymous Monk on Jul 14, 2022 at 15:05 UTC

    See also File::Spec->canonpath(). If you like what it does, fine. If not, the documentation lists some of the pitfalls involved in this operation. File::Spec is a core module.

      It doesn't.

      They don't remove '..' because it might change the meaning of the path in case of symlinks.