in reply to Re: Finding Redundant Files
in thread Finding Redundant Files

Roy Johnson,
Great idea. I would just expand it a little more. I would have a structure that looked like this:
my %mp3 = ( byname => {}; bymd5 => {}; );
Again, as you stated each key in the secondary level hash would be an array reference to a list of matching files. The difference here is that you will also get a list of duplicate file names in different directories that may not be the same song. This can cause problems when you try to merge the directories. I would suggest the following modules:

Cheers - L~R

Replies are listed 'Best First'.
Re: Re: Re: Finding Redundant Files
by waswas-fng (Curate) on Feb 06, 2004 at 23:29 UTC
    Because tags are stored inside the mp3 files, if you can't check for duplicates via the tags, md5 checksums will not help. Ie if you have a song with the title tag as "Yellow Sub" in one and "Yellow submarine" in another, even if the actual audio data portion of the mp3 is the exact same a md5 hash will show both files as being different. I would suggest using tag matching for exact duplicates and maybe a hash table using soundex or some variant on each tag to get a list of possible dups that you can hand ween through.


    -Waswas