mikfire has asked for the wisdom of the Perl Monks concerning the following question:

I am attempting to translate a possibly relative path to an absolute path. I need to correctly collapse . and .. if it appears in the path I am given.

I have toyed with this an awful lot today and I arrived at something like:

# NOTE: I have already prepended pwd if needed #remove . while( $path =~ m#/\./# ) { $path =~ s#/\.#/#; } # remove .. while( $path =~ m#/[^/]+?/\.\./# ) { $path =~ s#/[^/]+?/\.\.#/#; }
This was not satisfying - the m## and then an s### with exactly the same arguments was lacking. Remembering that s### returns true if something was replaced, I was able to reduce this to
#remove . while ( $path =~ s#/\./#/# ) {;} #remove .. while ( $path =~ s#/[^/]+?/\.\./#/# ) {;}
But this still does not please me. Using two loops when I feel I should only have to use one. Working a bit harder and unleashing the marvelous /e modifier, I was able to finally say
while ( $path =~ s#(/[^/]+)?/\.(\.)?/defined($2) ? "/" : "$1/"/e ) {;}
I am pleased but wonder what other ways the honored Monks have found to do this - my solution is likely to be a real CPU pig.

For reasons I cannot elucidate, this solution is right out

unless ( chdir $path ) { die "Couldn't there ($path) from here\n"; } return `pwd`;
To further the discussion, I will point out that using the /g modifier does not work - the regex engine does double quotish expansion only once and will not reapply the regex to the new string. A wise move, considering the evil deep recursions that could result.

Mik
Mik Firestone (perlus bigotous maximus )

Replies are listed 'Best First'.
Re: Normalized directory paths
by lhoward (Vicar) on May 16, 2000 at 22:13 UTC
    The File::PathConvert CPAN module will do what you need. It has several functions for manipulating relative and absolute paths, including rel2abs which sounds like just what you need..
Re: Normalized directory paths
by mikfire (Deacon) on May 16, 2000 at 21:37 UTC
    Damnit, where's perl -c when you need it? I fubar'd the last regex and it should look like
    while ( $path =~ s#(/[^/]+)?/\.(\.)?/#defined($2) ? "/" : "$1/"#e ) {; +}
    mea maxima culpa
    Mik Firestone ( perlus bigotus maximus )
Re: Normalized directory paths
by chromatic (Archbishop) on May 16, 2000 at 23:50 UTC
    Here's a complete program, demonstrating the correct solution given above:
    #!/usr/bin/perl -w use strict; my $path = "/foo/mik/../mik/./../mik"; # print "Path is --$path--\n"; $path =~ s!/\.([^.])!$1!g; # print "Path is now --$path--\n"; while ($path =~ m!/\.!) { $path =~ s!/[^\/]+/\.\.!!; # print "Path is now -+$path+-\n"; } print "Ended up with =>>$path<<=\n";
    First, we get rid of /., because that won't take us anywhere. Next we loop while there's a dot followed by a forward slash. As /.. takes us up a directory, we look for a valid directory followed by that construct, and get rid of the whole thing. Eventually, the loop has to fail, and we must have come up with something.

    merlyn and turnstep are correct, though -- doing this sort of thing with regular expressions is pretty easy to break. Unless you have *complete* control of the directories being passed to your script and in the filesystem, don't use this.

      #!/usr/bin/perl -w use strict; my $path = "/foo/mik/../mik/./../mik"; #print "Path is --$path--\n"; $path =~ s!/\.([^.])!$1!g; #print "Path is now --$path--\n"; while ($path =~ m!/\.!) { $path =~ s!/[^\/]+/\.\.!!; # print "Path is now -+$path+-\n"; }
      Forgive me, but this breaks when $path = "/foo/mik/.hidden". The first regex results in
      Path is now --/foo/mikhidden--

      Mik

      the worst situation is with replacing /somedir/../ . unfortunately pattern [^/]+/../ does not work as it should. please imagine situation, source path is /../../ and ..... this pattern matches! for complete clean and normalize input path i suggest this procedure:

      $path='./anything/../.../something'; #example $path=~s!/+!/!g; #replace //// by single / $path=~s!^\./!!; #remove starting ./ $path=~s!/\.(?=$|/)!!g; #remove all /. 1 while $path =~ s!/([^/]{3,}|[^/.][^/]*|\.[^/.])/\.\.(?=/|$)!!g; #re +move /something/.. $path=~s!^([^/]{3,}|[^/.][^/]*|\.[^/.])/\.\./!!; #remove starting som +ething/../ $path='.' if $path eq ''; #point current path if finally it is empty #at this place $path is normalized

      you can wear this code in some function :) for clarification what for is sequence ([^/]{3,}|[^/.][^/]*|\.[^/.]) ? this is something special. this matches to everything names except names that contains '/' character, and does not match to '..' . testing to matching single '.' is unneeded because this has been removed previously. notice, this path fragment matches to '...' and more dots, because only single and double dots are reserved. this provices file names like '.something', usually used as hidden names in unix like systems.

      Thank you for the congratulations and I am happy if this piece of code will be help for someone :) I know, I invented wheel again :)

      This enters an infinite loop if you have something like this:
      $path = "/.../up/.../down/../allaround";
RE: Normalized directory paths
by merlyn (Sage) on May 17, 2000 at 02:20 UTC
    BEWARE ANY SOLUTION THAT USES REGEX IS WRONG in the face of symbolic links. If it isn't touching the filesystem, it's broke.

    Unless you were asking about URLs and not files.

    -- Randal L. Schwartz, Perl hacker

      Eh? What's that? Couldn't hear you in the back....
      Hmm. Good point. Yes, I am talking about the filesystem. Thanks for the observation - something else to think about. For the part of the process using this function, I am really just needing the textual transform. The filesystem walk happens later.

      Mik

        But you *can't* do the textual transformation WITHOUT doing the filesystem walk!

        That is, if your string is

        /a/b/c/../d/e/f
        and c is a symlink, YOU CANNOT reduce that string unless you do a readlink on c. I wrote some code to do it once. One of my UnixReview columns I think.
Re: Normalized directory paths
by ZZamboni (Curate) on May 16, 2000 at 21:53 UTC
    How about this:
    # Prepend $pwd as necessary @p=(); foreach (split("/", $path)) { pop @p, next if $_ eq '..'; next if $_ eq '.'; push @p, $_; } $path=join("/", @p);
    It is clearer, in my opinion, but considerably slower:
    Benchmark: timing 1000 iterations of Mikfire, Zamboni... Mikfire: 1 wallclock secs ( 0.47 usr + 0.00 sys = 0.47 CPU) Zamboni: 4 wallclock secs ( 3.23 usr + 0.00 sys = 3.23 CPU)
    As given, your last solution produces double diagonals sometimes. For the string '/home/zamboni/tmp/../lib/./tex/..' it produces '/home/zamboni//lib//'. The solution is to match a second optional diagonal after the (\.)?, like this:
    while ( $path =~ s#(/[^/]+)?/\.(\.)?/?#defined($2) ? "/" : "$1/"#e ) { +;}

    --ZZamboni

Re: Normalized directory paths
by turnstep (Parson) on May 16, 2000 at 21:51 UTC

    I think you can do without the defined. Also, the 'g' modifier seems to work for me...

    Added later.. Well, you are right, the 'g' does not work. How about using
    1 while (blah blah);
    instead? I just hate seeing the contruct:
    {;}
      No. That was already tried and, as I mentioned, will not work. In order to avoid some very nasty deep recursion problems, the regex does double-quotish expansion only once, at the beginning of the match process.
      Your solution breaks on something ( admittedly idiotic, but for what I am doing I need to be absolutely certain I parse the path correctly ) like this:
      /foo/mik/../mik/./../mik
      The regex you provide translates that to /foo/mik/mik, when the correct answer is /foo/mik
      Mik