in reply to Re^3: Replace duplicate files with hardlinks
in thread Replace duplicate files with hardlinks

That's not quite how it works with softlinks. When you perform a file test on a softlink, think of it as performing the file test on the non-softlink target linked file (whether it be a plain file, directory, special file, etc.)

The only file test that is applicable to the softlink inode itself is the -l operator. Purpose: to find out if it's a softlink.

Oops, my bad! Thanks for clarifying. I got that false information from a web page (I may have misunderstood it). I'll update the code to check the inodes with -l and skip those that return true.

I would recommend creating softlinks instead of hardlinks. It's more apparent. But then you need to decide which inode of the duplicates becomes the softlinks' target.

Exactly. And since a priori I don't know where the original file would have to be, I'd rather just create a hard link. The problem with soft links is that later you could get into trouble if you delete the original file; then all the links that pointed to it would break. With hard links this is not an issue, since the actual information is not lost until the last link that points to it is removed.

I like how creating hard links seamlessly reduces disk usage while (nearly) not changing anything else. The only drawback that I see is that if you change a file, all the linked ones also change, and there may be situations in which you would not want that. But this is true for soft links either.

Maybe it would be nice to take an argument from the command line to let the user choose whether to create a soft link, a hard link, delete the duplicates or just report their existence.

  • Comment on Re^4: Replace duplicate files with hardlinks