in reply to List Duplicate Files in a given directory

Just a side note.

Calculating the MD5 digest (or any other checksum) of a file can take quite a bit of time, especially if the file is large.

And there is no point of computing the MD5 of two files to see whether they're identical if their size isn't the same. And, of course, finding the size of a file is much faster.

So I would suggest that a possible performance enhancement is to compute the MD5 of files only for files that have the same size.

  • Comment on Re: List Duplicate Files in a given directory

Replies are listed 'Best First'.
Re^2: List Duplicate Files in a given directory
by choroba (Cardinal) on Jul 31, 2017 at 11:08 UTC
    Exactly. That's also what my solution does.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,