ashaman has asked for the wisdom of the Perl Monks concerning the following question:

last time i posted was in late june. i would try and make excuses such as my computer broke, i got grounded, i was busy with school, some girl distracted me from what i should be doing, etc--all of which actually happened--but the truth of the matter is, i got discouraged. comparing two lists. it was supposed to just clean up my mp3 directory a little, get rid of the files that weren't on my play list. in a small scale test, it worked fine. when i put it into practice, it reduced my beloved collection of mp3s from about 1200 to about 36. it was by far one of the most terrifying moments in my life. in a great sense of self doubt, i haven't done any perl since. indeed, i only started trying to rebuild my mp3 collection a few days ago. but i'm taking a c++ class at my high school, and i guess i felt that it was time to get back on the horse and figure out why my script did what it did. and so, i come now to ask for help and guidance in finding the problem, because i sure as hell have no idea what went wrong.

Replies are listed 'Best First'.
Re: it's been awhile . . .
by tadman (Prior) on Oct 11, 2001 at 05:16 UTC
    When you're writing something that could do some lasting damage, it's always best to test it first. I've learned to be a bit paranoid, especially when running my own code, so more often then not I end up doing something like:
    my $still_testing = 1; my $talkative = 1; foreach my $to_delete (@liberation_pool) { print "Deleting $to_delete\n" if $talkative; unlink ($to_delete) unless $still_testing; }
    Look over the output and make sure things add up. As in, it's not deleting your entire MP3 directory, formatting your filesystem, or stealing food from your fridge. A tiny slip of logic and you could be facing a re-install. Laziness is a virtue, except when you're too lazy to test it properly and it bites you.

    Another thing is to not actually delete them, but to move them to some sort of 'orphan' folder (fixing name collisions, of course) which can be discarded at your leasure, much like the trash can/recycle bin. Use rename instead of unlink, for example. rename is dangerous too, since you can rename every single file in a directory to a single file, which 'mv' prevents you from doing in most cases.

    There are probably better test techniques than the simple 'printf debugging' described here. Just make sure you can turn on the safety when testing something that could be very dangerous.
Re: it's been awhile . . .
by George_Sherston (Vicar) on Oct 11, 2001 at 05:26 UTC
    Oy vey, how painful.

    I can't see why the code wouldn't do what it says on the can. My guess is that somewhere in there, perhaps at /#/ there's a mismatch between your playlist and the test criteria you apply to it, and @playlist ends up with the wrong stuff in it. To find out what's going wrong, I'd take the first half of your code and do
    open(PLAYLIST, "<playlist.m3u"); opendir(FILES, 'd:\my files'); my @playlist = (); open TEMP ">playlist.txt"; while(<PLAYLIST>) { if(not(/#/)) {print TEMP "$_\n"} }
    Then examine playlist.txt at your leisure and see whether it is what you'd expect it to be. Of course, before you run this... make sure you don't have a treasured file called playlist.txt that will get overwritten... that would be just too cruelly ironic.

    Please do post again and say how you get on and ask any other questions that occur to you. Perl is your friend, really, despite appearances.

    § George Sherston
Re: it's been awhile . . .
by blackmateria (Chaplain) on Oct 11, 2001 at 18:34 UTC
    Geez, that's rough. I think everyone's lost huge amounts of important files at one time or another, I know I've lost a couple of projects myself (no mp3's though :) Looks like you're on Windows from the "C:\my files." The thing on Windows is, you have to remember to do case-insensitive matching. Your script has this line:
    if(not(/#/)) {$playlist{$_} = 1}
    You need to normalize the filenames before you put them in the hash. You can do this by making them all lowercase. It wouldn't be a bad idea to hang onto the original name too. The hash value seems like a good place to put it:
    if(not(/#/)) {$playlist{lc $_} = $_}
    Be sure to do a case-insensitive match when searching for mp3 files with readdir:
    my @filelist = grep {/\.mp3$/i} readdir FILES;
    While we're at it, better make those filenames lowercase too. That means @filelist should become %filelist:
    my %filelist = map {(lc $_, $_)} grep {/\.mp3$/i} readdir FILES;
    The "map" just changes each filename (from the grep) to a pair of (lowercase filename, filename), which then goes into the hash %filelist. Now, just create a list of the differences:
    my %remove;
    for (keys %filelist) {
      $remove{$_} = 1 unless $playlist{$_};
    }
    Now "keys %remove" is the list of files you want to delete (WARNING: not really. It doesn't include the folder name!) Here's a script I made to put all this stuff together. It doesn't skip lines with # though, and it doesn't have pike's advice about trimming whitespace. It's a start though.
    use strict ; use warnings ; my $path = '\\\\sephiroth\\pub\\music\\sdb\\ginsu gnives\\' ; my $playlist = '.\\playlist' ; opendir (my $DIR, $path) || die "opendir (\"$path\"): $!\n" ; my %files = map {(lc $_, "$path$_")} grep {/\.mp3$/i && !-d "$path$_"} + readdir $DIR ; closedir ($DIR) ; open (my $PLAYLIST, "<$playlist") || die "open (\"$playlist\"): $!\n" +; my %playlist = map {y/\r\n//d; (lc $_, 1)} <$PLAYLIST> ; close ($PLAYLIST) ; print join ("\n\t", "\%files =", map {"$_ => $files{$_}"} keys %files) +, "\n" ; print join ("\n\t", "\%playlist =", keys %playlist), "\n" ; my %remove ; for (keys %files) { $remove{$_} = 1 unless $playlist{$_} ; } print join ("\n\t", "\%remove =", keys %remove), "\n\n" ; print "unlink $files{$_}\n" for keys %remove ; #unlink $files{$_} for keys %remove ;
    Hope that helps. One last thing, though, I have an mp3 called "Seinfeld- Soup Nazi.mp3." Note that there's no space before the first dash. If I put this in the list with the correct spacing (say I hand-typed my list), this mp3 would get deleted even though I definitely want to keep it. What I'm saying is that this whole thing with exact filename matches is maybe not quite the right way to go about deleting files. It could be made to work though, and it has the advantage of not requiring much perl knowledge :). Anyway, glad to hear you haven't given up on perl for good, despite the traumatic loss of mp3's. Perl is a lot easier to learn than C++, IMHO. (I program in both on a daily basis.) Keep the faith!
Re: it's been awhile . . .
by blakem (Monsignor) on Oct 11, 2001 at 04:55 UTC
    Sorry to hear it...

    Tough to help you out w/o seing the code you actually ran, any idea at this point where the script you used is? Was it exactly the code you originally asked about, or did you modify it from there?

    -Blake

      naah, crap, forgot to mention, it's a test version, only prints what it should delete. but . . . i'm sure you could figure that out . . .
        what the--?! bloody hell, it posted it . . . *grumbles* ok, apparently i accidentally wrote a reply to my post instead of actually posting it and then replying, my bad . . . uhh, that is a reply to this: ok, this should be roughly like what i used. unfortunately, when my other computer died, it took all my files with it, so i don't have the actual script i used. but in tested, this one makes the same mistake. fortunately, i now have two 40GB hds, so i actually have room for back up. :)
        use strict; open(PLAYLIST, "<test.m3u"); my $dir = 'c:\my files'; opendir(FILES, "$dir"); open (TEST, ">test.txt"); my %playlist; my @filelist = grep {/\.mp3$/} readdir FILES; my $song; while(<PLAYLIST>) { chomp; if(not(/#/)) {$playlist{$_} = 1} } foreach $song(@filelist) { if(!exists $playlist{$song}) { print TEST $dir . "\\" . $song . "\n" } } closedir FILES; close PLAYLIST;
        btw, it's kinda been awhile since i used perl, and i don't remember exactly what {$playlist{$_} = 1} does, or how it does it. could someone refresh my memory? yeah, i know, i really should start commenting things
Re: it's been awhile . . .
by grinder (Bishop) on Oct 11, 2001 at 19:26 UTC

    That sucks!

    I have the following script that I use to comb through a filesystem and kill off the duplicates. I'm pretty confident it works correctly. I staked my mp3 collection on it after all. (Weeded out about 3Gb of duplicates in ~55Gb).

    #! /usr/bin/perl -w use strict; use File::Find; use Digest::MD5; my %digest; my $total_bytes = 0; my $dups = 0; sub wanted { return unless -f $_; if( !open IN, $_ ) { print "Cannot open $_ for input: $!\n"; return; } my $md5 = Digest::MD5->new; my $d = $md5->addfile( *IN )->digest; close IN; my $bytes = -s $_; return unless $bytes; if( defined $digest{$d} ) { print "$bytes\t$digest{$d}\t$File::Find::name\n"; unlink $_; $total_bytes += $bytes; ++$dups; } else { $digest{$d} = $File::Find::name; } } foreach my $d ( @ARGV ) { print "=== directory $d\n"; find \&wanted, $d; } printf "Statistics: Duplicates: %12d Bytes: %12d KBytes: %12d MBytes: %12d GBytes: %12d ", $dups, $total_bytes, $total_bytes / (1024**1), $total_bytes / (1024**2), $total_bytes / (1024**3); __END__

    hope this helps!

    --
    g r i n d e r
Re: it's been awhile . . .
by archen (Pilgrim) on Oct 11, 2001 at 19:33 UTC
    I think this might also stress the importance of backups. I know how you feel because I've done stuff like this too, and it really sucks. But what if your hard drive had simply failed? I think if I attempted something like this I would have run a test on a smaller data set that was quarantined somewhere, maybe on a different disk partition if needed. Most of the scripts I make are cheesy utility scripts, but as soon as they start changing any data I become rather paranoid. Of course I say that now, but I'm sure sometime in the future I'll blow up my computer anyway....

    Well I guess hindsight is 20/20, but maybe a better solution would be to move anything not on your list to a specific directory that you manually sort through. It might be a pain, but a bit more safe.