stat to identify files?

Skeeve has asked for the wisdom of the Perl Monks concerning the following question:

I want to find certain files in my system (OS X) and create symbolic links to them. For this, I use File::Find. I want to make sure, not to link to the same file twice. So I dont follow links (follow => 0). But what if I want to rerun the process later and don't want to create all the links already present a second time? Is it sufficient to collect, in a first run, all the devicenumbers and inodes of the existing links in my targetdirectory like this:

my %stat;
find ( sub {
        my($dev,$ino)= stat $File::Find::name;
        ++$stat{$dev}->{$ino};
}, $targetdirectory);
[download]

and then later check like this:

find( {
    wanted => sub {
        :
        :
        # check whether or not we already know of this file
        my($dev,$ino)= stat $File::Find::name;
        return if $stat{$dev}->{$ino}++;
        :
        :
    },
    no_chdir => 1,
    follow => 0,
}, '/');
[download]

s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Comment on stat to identify files? Select or Download Code

Replies are listed 'Best First'.
Re: stat to identify files? by Fletch (Bishop) on Jun 11, 2008 at 12:19 UTC
A given device/inode tuple should be unique for a given configuration, but keep in mind that if you're dealing with transient filesystems (e.g. sshfs) that the tuples may not stay consistent across remounts (both device and inode could change for the same path). ## Stat something on an sshfs-mounted volume $ stat -s /Volumes/foo/Makefile.am st_dev=754974730 st_ino=4 st_mode=0100664 st_nlink=1 st_uid=501 st_gid +=501 st_rdev=0 st_size=41056 st_atime=1206379971 st_mtime=1206379971 +st_ctime=0 st_birthtime=0 st_blksize=65536 st_blocks=88 st_flags=0 $ umount /Volumes/foo ## convenience wrapper to call sshfs with some extra options $ sshfsmount somehost:foo foo ICON: foo => /Users/fletch/lib/icons/icns/gir.icns $ stat -s /Volumes/foo/Makefile.am st_dev=754974731 st_ino=3 st_mode=0100664 st_nlink=1 st_uid=501 st_gid +=501 st_rdev=0 st_size=41056 st_atime=1206379971 st_mtime=1206379971 +st_ctime=0 st_birthtime=0 st_blksize=65536 st_blocks=88 st_flags=0 [download] So long as you're only comparing them inside a run and your filesystem's not dropping out from underneath you should be OK, but don't depend on it across remounts / reboots. In that case I'd move to something more intrinsic to the file (say Digest::SHA1) that could be recalculated (but that's going to entail more work and IO than just using the paths and stat metainfo). The cake is a lie. The cake is a lie. The cake is a lie.	[reply] [d/l]
Re^2: stat to identify files? by Skeeve (Parson) on Jun 11, 2008 at 17:13 UTC
Thanks for the hint! `s$$([},&%#}/&/]+}%&{});#$&&s&&$^X.($'^"%]=\&(\|?{%` `+`.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e	[reply] [d/l] [select]
Re: stat to identify files? by tachyon-II (Chaplain) on Jun 11, 2008 at 13:02 UTC
Why not just check if the link exists and create it if it does not?	[reply]
Re^2: stat to identify files? by Skeeve (Parson) on Jun 11, 2008 at 17:12 UTC
That's what I'm trying to do, checking whether or not the link exists. Can you please elaborate? I don't understand what you want to say. `s$$([},&%#}/&/]+}%&{});#$&&s&&$^X.($'^"%]=\&(\|?{%` `+`.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e	[reply] [d/l] [select]
Re^3: stat to identify files? by alexm (Chaplain) on Jun 11, 2008 at 18:19 UTC
See `-f` under -X to check whether a filename is a symlink and use readlink to get the symlink value.	[reply] [d/l]
Re^4: stat to identify files? by Skeeve (Parson) on Jun 11, 2008 at 19:14 UTC
Re: stat to identify files? by Anonymous Monk on Jun 11, 2008 at 10:04 UTC
Probably :) http://en.wikipedia.org/wiki/Inode	[reply]
Re: stat to identify files? by scorpio17 (Canon) on Jun 11, 2008 at 21:39 UTC
You could use DB_File to store every filename (with complete path) that you link to. This gives you a persistent hash that you can check against (if the path/file is not already in the hash, then you have never linked to it before, etc.) This should work fine unless you have more than millions of files to keep track of.	[reply]
Re^2: stat to identify files? by Skeeve (Parson) on Jun 12, 2008 at 06:06 UTC
That's not needed. Thanks anyway. I can create the hash as described above. This also has the advantage that, should a link be removed, it will be recreated later, in the second run. The question was merely about the devno/inode tupel. `s$$([},&%#}/&/]+}%&{});#$&&s&&$^X.($'^"%]=\&(\|?{%` `+`.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e	[reply] [d/l] [select]