Note that du is much smarter than a simple
File::Find routine. du knows it
has seen a file before if you have multiple links to a file.
The File::Find solution would include the size
multiple times.
Well that would be pretty easy to fix, wouldn't it? You just have to stat each file and cache the inode, to determine whether you have seen the file or not already before accumulating.
The only question, depending on the number of file and the number of inodes of the filesystem, is what would use less memory, a hash or a bitvec?