Tanktalus has asked for the wisdom of the Perl Monks concerning the following question:

Following symbolic links (symlink/readlink) seems to me to be a hard problem. As in, if you don't watch out for all the minutae, you can get hammered.

So, I'm hoping someone already has solved that problem in a way accessible in perl.

I'm not looking to follow a symlink all the way to its end - the filesystem does a superb job of that already. I'm actually looking, in my case, for a particular actual symlink. I want to evaluate each symlink in the chain to determine whether I want to archive the target file, or the symlink itself. The criteria is simple, and thus less interesting (if a symlink matches a particular regular expression, I want to go to the next one in the chain, which could be the target file, otherwise I want to keep the symlink as is). However, trying to determine the symlink chain is difficult.

For example, using a simple chain of "S" (for "Symlink"), we could have:

foo -> S/foo S/foo -> S/foo # that's not a circular loop - there's an S in S S/S/foo -> ../lib/foo # keep the symlink because it doesn't start with + "S/"

Note that the filesystem I'm looking at is NFS-mounted, and read-only. And the directory structure I'm looking at is multiple-GB in size, so copying the whole thing becomes somewhat prohibitive in time (not so much in space). Especially following the symlinks during such a copy - we start jumping around from NFS server to NFS server.

Has anyone had to deal with this type of thing before? Are there modules out there that I missed which would help me go through this? Or is there a simplification that I'm just not seeing?

Thanks,

Replies are listed 'Best First'.
Re: Following symlinks manually
by merlyn (Sage) on Jan 15, 2007 at 22:31 UTC
Re: Following symlinks manually
by sgifford (Prior) on Jan 15, 2007 at 23:44 UTC
    I see two hard parts here: resolving relative links and avoiding loops. For resolving relative links, File::Spec's rel2abs might help; notice you can give a base directory where the relative path should start from. For avoiding loops, many OS's just keep a count and give up after some number of symlinks; you could also track specific links in a hash and stop chasing links if you end up someplace you've been before.

    I'm not sure whether you care about parent directories being symlinks, too. If so, that's a third hard part, though it's certainly manageable.

    Here's some code to chase symlinks and print out what it finds, using rel2abs to deal with relative links and a hash to check for loops. I think it could be adapted to your needs.

    #!/usr/bin/perl use warnings; no warnings 'uninitialized'; use strict; use File::Spec qw(rel2abs); use File::Basename; chaselink($ARGV[0]); sub chaselink { my %seen = (); my $chase; $chase = sub { my($f,$d)=@_; print "\nChasing link '$f' in '$d'\n"; my $l = readlink($f); if (!defined($l)) { print "$f is not a link.\n"; return undef; } print "Relative link: $l from $d\n"; my $a = File::Spec->rel2abs($l,$d); print "Absolute link: $a\n"; if ($seen{$a}) { print "Found loop, giving up\n"; return undef; } $seen{$a}=1; $chase->($a,dirname($a)); }; $chase->(@_); }
      This fragment follows the symlinks of $0 (the program name) to determine the home directory for the current program. While it's impossible for there to be a loop (the program got started, after all), the code does check for loops.

      I'm not happy about the `pwd` dependency, but 'use Cwd' would have been longer and because I put this code in my BEGIN {} block, I wasn't sure if I could use 'use'. Also, it assumes '/' is the path separator and that \r and \n don't appear in the program pathname.

      my $linkcount=50; (my $file=$0)=~s/.*\///; (my $HOME=$0)=~s/[^\/]*$//; $HOME||=`pwd`."/"; $HOME=~s/[\r\n]//g; while (defined(my $l=readlink($HOME.$file))) { if ($linkcount--<0) {die("$0: symlink loop detected, dying\n");} ($file=$l)=~s/.*\///; if (substr($l,0,1) eq "/") {($HOME=$l)=~s/[^\/]*$//;next;} (my $npwd=$l)=~s/[^\/]*$//; $HOME.=$npwd; }; print "The home directory for this program is $HOME\n";
Re: Following symlinks manually (historical code)
by shmem (Chancellor) on Jan 16, 2007 at 00:09 UTC

    This post is of little use and contains bad (style) obsolete code.

    Nearly 13 years ago, I had to find orphaned and looped symlinks in our cad lab network at the university, a heavily NFS infested environment with cross mounts between workstations and servers.

    I concocted the following, based on an example from the first Camel Book (the pink one; the example code must have been either merlyn's or Tom Christiansen's). It's perl4.

    Must be one of my very first perl scripts, I didn't bother to rewrite it, it has worked ever since. I keep it like that ol' rusty tool you just don't want to drop, despite of the current screwdrivers being shinier.

    Sorry for posting my old cruft. For some obscure reason, it had to be done (?).

    update: added links

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Following symlinks manually
by kyle (Abbot) on Jan 15, 2007 at 22:39 UTC

    It seems pretty straight forward to me, so maybe I'm missing something. readlink returns undef when you point it at a non-link, so...

    sub chase_links { my ( $file ) = @_; while ( defined ( my $link = readlink $file ) ) { print "LINK: $file"; $file = $link; } print "NOT LINK: $file"; }

    That will list out the links until it gets to a real file. You can insert any logic you like into that loop to short circuit or what-have-you. Is this problem harder than I think?

    UPDATE: Oh, I guess it is harder than I thought. The above only works if all links are absolute or relative to the current directory. To work with scattered relative links, you'd have to chdir or track the base directories.

      Yes, much harder ;-)

      Try this at the shell:

      mkdir a mkdir b ln -s a S ln -s ../b S/S touch b/foo ln -s S/foo foo ln -s S/foo S/foo
      Now, take a look around for a bit. And then try your chase_links on foo.
      perl -e 'my $file = shift; while(defined(my $link = readlink $file)) { + print "LINK: $file\n"; $file = $link; sleep 1; } print "NOT LINK: $f +ile\n"' foo
      Note the sleep in there - I did that to make it easier to kill, 'cuz it's going to go on forever.

      One needs to keep careful track of current directories to figure out where the relative links are actually relative to.

Re: Following symlinks manually
by sgt (Deacon) on Jan 15, 2007 at 23:53 UTC

    if it is only for a "copy", maybe a scheme like this can be useful:

  • archive dir. struct without following symlinks (tar zcf ..., or pax)
  • list symlinks: if relative they are already copied
  • if not, if file just copy, if dir repeat the process
  • for the general symlink problem, it might be useful to separate dir part from file-part, and break the dir part by components. First we get the list of symlinks in your hierarchy (from the OS, find or whatever), and then process each element of the list:

  • for any symlink /a/b/.../y/z is it a dir?
  • if not then /a/b.../y is a dir!
  • now we study each path component starting with /a
  • is /a a symlink? if it is absolute you list, and keep going down the original path, if it is relative you might do the equivalent of the shell ( cd -P $dir || exit 1; echo $PWD) for a start
  • part of the problem is that some links might be stale or even incorrect when a pattern like ../dir/../dir2/.. is not OK, so an intermediary readlink helps

    just a thought, --stephan
Re: Following symlinks manually
by Tux (Canon) on Jan 15, 2007 at 23:47 UTC
    What is wrong with
    use Cwd qw( abs_path ); $file = abs_path ($link);