comment on

Monks not familiar with "hard links" would need to understand the following details:

The concept of hard links applies only to unix/linux (including macosx).
Hard links only work within a given disk volume (you can't have a hard link on one disk that points to a file on another disk).
Hard links only apply to data files, not to directories or other file types (e.g. devices, symbolic links).
Creating one or more hard links to a given file is really just a matter of having more directory entries describing/pointing to that file.
Once a hard link is created, you can't really identify it as such (i.e. as anything other than a plain data file). You can figure out when a given file has more than one directory entry describing/pointing to it (checking the link count shown by "ls -l"), and you can figure out which directory entries point to the same file (checking for matching inode numbers with "ls -i") **, but all entries have "equal status" -- the original directory entry is simply equivalent to (i.e. one of) the hard links.

With those details in mind, I suspect that if you run your script repeatedly in succession on the same path, it will find/rename/replace/delete the same set of duplicate files, more or less identically, on each run.

There's nothing in the File::Find::Duplicates man page about how it determines files to be duplicates, and there is no reason to expect that it knows or cares about existing hard links (since these are not mentioned in the docs, and are OS-dependent anyway). So, existing hard links will probably look like duplicates, and will be (re)replaced on every run.

For that matter, I wonder what that module would do if you were to replace duplicate files with symbolic links instead of hard ones. I think the *n*x notion of "symlinks" ports to MS-Windows as "short-cuts", so this may be somewhat more portable, but you'd have to look at the sources for F::F::Dups to see whether it picks up on the difference between a data file and any sort of link.

In any case, I tend to prefer symlinks anyway -- there tends to be less confusion when it comes to figuring out actual vs. apparent disk space usage.

And that brings up another point you might want to test with your script: does F::F::Dups know enough to leave symlinks alone, or does it follow them when looking for dups? If the latter, you can get into various kinds of trouble, like trying to create hard links to files on different volumes (won't work) or even deleting the target of a symlink while leaving the symlink itself as the "unique version" -- which then becomes a stale link with no existing data file as the target. Note that a symlink can have a directory as its target (as well as files/directories on different disks), so if your script runs on a tree like this:

  toplevel/
     secondlevel_1/
        thirdlevel_1/
        thirdlevel_2/
            file1.dat
            file2.dat
     secondlevel_2  -> secondlevel_1/thirdlevel_2   # directory symlin
+k
[download]

will there be an apparent duplication of file1.dat and file2.dat under two different paths? If so, ~~I think~~ what is the likelihood that your script will have (or cause) some trouble?

** FOOTNOTE (UPDATE) ** Please note the very informative reply provided below by MidLifeXis. As he points out, my references to "ls -l" and "ls -i" should not be taken as implementation ideas for detecting hard links in a perl script. I mentioned these uses of "ls" merely to cite the easiest way for a person to look into the behaviors of hard links.

In reply to Re: Replace duplicate files with hardlinks by graff
in thread Replace duplicate files with hardlinks by bruno

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.