Following symlinks manually

Tanktalus has asked for the wisdom of the Perl Monks concerning the following question:

Following symbolic links (symlink/readlink) seems to me to be a hard problem. As in, if you don't watch out for all the minutae, you can get hammered.

So, I'm hoping someone already has solved that problem in a way accessible in perl.

I'm not looking to follow a symlink all the way to its end - the filesystem does a superb job of that already. I'm actually looking, in my case, for a particular actual symlink. I want to evaluate each symlink in the chain to determine whether I want to archive the target file, or the symlink itself. The criteria is simple, and thus less interesting (if a symlink matches a particular regular expression, I want to go to the next one in the chain, which could be the target file, otherwise I want to keep the symlink as is). However, trying to determine the symlink chain is difficult.

For example, using a simple chain of "S" (for "Symlink"), we could have:

foo -> S/foo
S/foo -> S/foo        # that's not a circular loop - there's an S in S
S/S/foo -> ../lib/foo # keep the symlink because it doesn't start with
+ "S/"
[download]

Note that the filesystem I'm looking at is NFS-mounted, and read-only. And the directory structure I'm looking at is multiple-GB in size, so copying the whole thing becomes somewhat prohibitive in time (not so much in space). Especially following the symlinks during such a copy - we start jumping around from NFS server to NFS server.

Has anyone had to deal with this type of thing before? Are there modules out there that I missed which would help me go through this? Or is there a simplification that I'm just not seeing?

Thanks,

Comment on Following symlinks manually Download Code

Replies are listed 'Best First'.
Re: Following symlinks manually by merlyn (Sage) on Jan 15, 2007 at 22:31 UTC
You might be looking for my article on expanding symbolic links, at least as a way to look at the strategy of solving it. Perhaps a module has arisen since 1999 to solve it more directly. -- Randal L. Schwartz, Perl hacker	[reply]
Re: Following symlinks manually by sgifford (Prior) on Jan 15, 2007 at 23:44 UTC
I see two hard parts here: resolving relative links and avoiding loops. For resolving relative links, File::Spec's `rel2abs` might help; notice you can give a base directory where the relative path should start from. For avoiding loops, many OS's just keep a count and give up after some number of symlinks; you could also track specific links in a hash and stop chasing links if you end up someplace you've been before. I'm not sure whether you care about parent directories being symlinks, too. If so, that's a third hard part, though it's certainly manageable. Here's some code to chase symlinks and print out what it finds, using `rel2abs` to deal with relative links and a hash to check for loops. I think it could be adapted to your needs. #!/usr/bin/perl use warnings; no warnings 'uninitialized'; use strict; use File::Spec qw(rel2abs); use File::Basename; chaselink($ARGV[0]); sub chaselink { my %seen = (); my $chase; $chase = sub { my($f,$d)=@_; print "\nChasing link '$f' in '$d'\n"; my $l = readlink($f); if (!defined($l)) { print "$f is not a link.\n"; return undef; } print "Relative link: $l from $d\n"; my $a = File::Spec->rel2abs($l,$d); print "Absolute link: $a\n"; if ($seen{$a}) { print "Found loop, giving up\n"; return undef; } $seen{$a}=1; $chase->($a,dirname($a)); }; $chase->(@_); } [download] -- sgifford's Web page	[reply] [d/l] [select]
Re^2: Following symlinks manually by Anonymous Monk on Jul 07, 2010 at 21:57 UTC
This fragment follows the symlinks of $0 (the program name) to determine the home directory for the current program. While it's impossible for there to be a loop (the program got started, after all), the code does check for loops. I'm not happy about the `pwd` dependency, but 'use Cwd' would have been longer and because I put this code in my BEGIN {} block, I wasn't sure if I could use 'use'. Also, it assumes '/' is the path separator and that \r and \n don't appear in the program pathname. my $linkcount=50; (my $file=$0)=~s/.\///; (my $HOME=$0)=~s/[^\/]$//; $HOME\|\|=`pwd`."/"; $HOME=~s/[\r\n]//g; while (defined(my $l=readlink($HOME.$file))) { if ($linkcount--<0) {die("$0: symlink loop detected, dying\n");} ($file=$l)=~s/.\///; if (substr($l,0,1) eq "/") {($HOME=$l)=~s/[^\/]$//;next;} (my $npwd=$l)=~s/[^\/]*$//; $HOME.=$npwd; }; print "The home directory for this program is $HOME\n"; [download]	[reply] [d/l]
Re: Following symlinks manually (historical code) by shmem (Chancellor) on Jan 16, 2007 at 00:09 UTC
This post is of little use and contains bad (style) obsolete code. Nearly 13 years ago, I had to find orphaned and looped symlinks in our cad lab network at the university, a heavily NFS infested environment with cross mounts between workstations and servers. I concocted the following, based on an example from the first Camel Book (the pink one; the example code must have been either merlyn's or Tom Christiansen's). It's perl4. Must be one of my very first perl scripts, I didn't bother to rewrite it, it has worked ever since. I keep it like that ol' rusty tool you just don't want to drop, despite of the current screwdrivers being shinier. Read more... (8 kB) Sorry for posting my old cruft. For some obscure reason, it had to be done (?). update: added links --shmem _($_=" "x(1<<5)."?\n".q·/)Oo. G°\ / /\_¯/(q / ---------------------------- \__(m.====·.(_("always off the crowd"))."· ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}	[reply] [d/l] [select]
Re: Following symlinks manually by kyle (Abbot) on Jan 15, 2007 at 22:39 UTC
It seems pretty straight forward to me, so maybe I'm missing something. readlink returns `undef` when you point it at a non-link, so... `sub chase_links { my ( $file ) = @_; while ( defined ( my $link = readlink $file ) ) { print "LINK: $file"; $file = $link; } print "NOT LINK: $file"; }` [download] That will list out the links until it gets to a real file. You can insert any logic you like into that loop to short circuit or what-have-you. Is this problem harder than I think? UPDATE: Oh, I guess it is harder than I thought. The above only works if all links are absolute or relative to the current directory. To work with scattered relative links, you'd have to chdir or track the base directories.	[reply] [d/l] [select]
Re^2: Following symlinks manually by Tanktalus (Canon) on Jan 15, 2007 at 22:47 UTC
Yes, much harder ;-) Try this at the shell: `mkdir a mkdir b ln -s a S ln -s ../b S/S touch b/foo ln -s S/foo foo ln -s S/foo S/foo` [download] Now, take a look around for a bit. And then try your chase_links on foo. `perl -e 'my $file = shift; while(defined(my $link = readlink $file)) { + print "LINK: $file\n"; $file = $link; sleep 1; } print "NOT LINK: $f +ile\n"' foo` [download] Note the sleep in there - I did that to make it easier to kill, 'cuz it's going to go on forever. One needs to keep careful track of current directories to figure out where the relative links are actually relative to.	[reply] [d/l] [select]
Re^3: Following symlinks manually by davidrw (Prior) on Jan 15, 2007 at 23:06 UTC
A CPAN search on 'symbolic links' found File::Spec::Link in the File::Copy::Link distro: `perl -MFile::Spec::Link -le 'print File::Spec::Link->resolve_all(shift +)' foo` [download] It returns `b/foo` for Tanktalus's example setup	[reply] [d/l] [select]
Re^4: Following symlinks manually by Tanktalus (Canon) on Jan 16, 2007 at 00:20 UTC
Re: Following symlinks manually by sgt (Deacon) on Jan 15, 2007 at 23:53 UTC
if it is only for a "copy", maybe a scheme like this can be useful: archive dir. struct without following symlinks (tar zcf ..., or pax) list symlinks: if relative they are already copied if not, if file just copy, if dir repeat the process for the general symlink problem, it might be useful to separate dir part from file-part, and break the dir part by components. First we get the list of symlinks in your hierarchy (from the OS, find or whatever), and then process each element of the list: for any symlink /a/b/.../y/z is it a dir? if not then /a/b.../y is a dir! now we study each path component starting with /a is /a a symlink? if it is absolute you list, and keep going down the original path, if it is relative you might do the equivalent of the shell ( cd -P $dir \|\| exit 1; echo $PWD) for a start part of the problem is that some links might be stale or even incorrect when a pattern like ../dir/../dir2/.. is not OK, so an intermediary readlink helps just a thought, --stephan	[reply]
Re: Following symlinks manually by Tux (Canon) on Jan 15, 2007 at 23:47 UTC
What is wrong with `use Cwd qw( abs_path ); $file = abs_path ($link);` [download]	[reply] [d/l]