So I have a mass of mp3 files, in an unorganized collection of sub-directories. I know that I have multiple copies of the same songs in different places. Usually with the same file name, but not always. My goal then is to find duplicate copies of the same file in a directory hierarchy. The easy version would just compare file names, whereas the harder version would do a bit-wise comparison. I'm not even sure that a bit-wise comparison would work with MP3 (given the potential for tags and what not) but hey, why not try?
My first thought then was that I would read the file structure into a hash and look for duplicates... but I'm not sure how to go about doing this intelligently. Can anyone help?