Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Distribute MP3 recordings optimally to CD-Rs

by Brutha (Friar)
on Sep 30, 2005 at 11:34 UTC ( [id://496388]=CUFP: print w/replies, xml ) Need Help??

After collecting lots of MP3 recordings of radio audio-plays, I think it is time to clean up my harddisk and burn them to CD-Rs. Of course, I want to fill each CD as much as possible. Task for the Perl programm is, to tell me which files I have to burn together on one CD-R.

The 'du' command gives me a list of file/directory sizes, which I clean up manually, that is drop subdirectories etc. This list is read into a list called @files, which contains pairs of size and name. I sort this one, for the biggest directories first. Now I call my little subroutine TrySolution, which finds out the combinations of files sums up to the closest value of space available on a CD-R. The solution is thrown out of the @files list and TrySolution is started again for the next solution as long as entries are left in @files.

Sounds easy, but do you remember all those nifty algorithms from your university times and do they fit? I didn't, but this was a nice exercise. The solution is still not optimal, but reasonably fast, e.g. you could stop the recursive call, if you can exclude impossible branches. And it was fun.

But when I finished and was proud about the fantastic combinations found, I suddenly recognized, that I asked the wrong question. I asked for filling each CD to a maximum, I should have asked to fille as little CDs as possible. But that is another question ;-)

Read below for the main parts of my script:

# Global variables my @files; # Array of (Size,Name) my $nr_entries = 0; # Anzahl gefundener Dateien my @opt_solution_sizes; # sizes of current opt. combination my @opt_solution_idx; # indices of current opt. combination my $opt_solution_tot; # size of current opt. combination # Read files and sort biggest first open F, "<$filename" or die "Cannot open $filename\n$!"; while (my $line = <F>) { chomp $line; my ($size, $name) = split /\s+/, $line, 2; push @files, [$size, $name]; $nr_entries++; } close F; @files = sort {$$b[0] <=> $$a[0]} @files; sub TrySolution { my $stufe = shift; my @solution_sizes = @{ $_[0] }; my @solution_idx = @{ $_[1] }; my $solution_tot = sum(@solution_sizes); if ( $solution_tot > $opt_solution_tot ) { # new Optimum @opt_solution_sizes = @solution_sizes; @opt_solution_idx = @solution_idx; $opt_solution_tot = sum(@opt_solution_sizes); } if ($stufe==$nr_entries) { # last entry return; } # retry, for all entries, that can give possible solutions for my $try ( $stufe..($nr_entries-1) ) { my $try_size = $files[$try]->[0]; if ( $solution_tot+$try_size < $options{cdsize}) { # If possible, then try with extended set TrySolution ( $try+1, [@solution_sizes, $try_size], [@solution_idx, $try] ); } } } # Main while (@files) { my @new_set = (); @opt_solution_sizes = (); @opt_solution_idx = (); $opt_solution_tot = 0; TrySolution (0, [], [] ); print "Optimal solution is $opt_solution_tot with ", commify_series (@opt_solution_sizes), "\n"; # see Perl Cookbook for commify_series, nice string out of list @new_set = (); foreach my $idx ( reverse @opt_solution_idx) { # start from end, if array is cut in the middle unshift @new_set, splice @files, $idx, 1; $nr_entries--; # global counter, left input list entries } # Now output result print "\nNext Set is:\n", '-' x 10, "\n"; for (my $i=0; $i < @new_set; $i++) { printf "%2d) [%3d] %s\n", $i, $new_set[$i]->[0], $new_set[$i]->[1]; } print "\n"; }

And it came to pass that in time the Great God Om spake unto Brutha, the Chosen One: "Psst!"
(Terry Pratchett, Small Gods)

Replies are listed 'Best First'.
Re: Distribute MP3 recordings optimally to CD-Rs
by Limbic~Region (Chancellor) on Sep 30, 2005 at 13:15 UTC
      Search is your friend, if you know how to tell what you search for. It is an art for itself to create proper titles and feeding proper keywords. I was not able to find that thread, thank you for pointing out.

      And it came to pass that in time the Great God Om spake unto Brutha, the Chosen One: "Psst!"
      (Terry Pratchett, Small Gods)

Re: Distribute MP3 recordings optimally to CD-Rs
by pboin (Deacon) on Sep 30, 2005 at 12:18 UTC
Re: Distribute MP3 recordings optimally to CD-Rs
by revdiablo (Prior) on Sep 30, 2005 at 14:22 UTC

    As pboin points out, this can be approached in terms of the knapsack problem. But there is a specialization of that, called the bin packing problem, which is simpler and also makes sense here. You can read about it at http://en.wikipedia.org/wiki/Bin_packing.

    Since I do this fairly frequently, I wrote a module called Algorithm::BinPack to help me solve it. Even if you don't use my module, you might want to take a look at it -- the algorithm it uses has an intuitive simplicity that I really like. (And no, I didn't invent the algorithm, I just codified it into a module.)

Re: Distribute MP3 recordings optimally to CD-Rs
by eric256 (Parson) on Sep 30, 2005 at 14:58 UTC

    It seems this problem would be best solved by sorting your data biggest to smallest. Then just go over the list picking the next file that still fits in the space you have left. That should be pretty straight forward and fast. /me begins working on a general solution.


    ___________
    Eric Hodges

      ... which is one of the classical "good try" solutions to the knapsack problem.

      --MidLifeXis

      That's basically the algorithm I used in Algorithm::BinPack. Sort by biggest-first, then iterate on each item. If it doesn't fit in the first bin, try the next bin. Rinse, repeat.

        Don't know if this is relevant but do you need to allow space for overheads like directories etc?

        Is that part of the rough 700Mb per CD?

        Does that overhead change with number of files? ie can you fit 2 x 349 Mb files (698Mb) but only 195 x 3.49 Mb (680Mb).

        Doesn't really change the principal behind the algorithm.

        Although not directly relevent to computer files there is also the issue of retrieval. The best packing method my be less efficient if the files you most want to retrieve are scattered. This is part of programming for loading a lorry for a drop load. You need to make the best packing but also take into account the best route for the drop and how they will get stuff off.
      That was my manual solution, but it is not the best. Think of having sizes of 5,4,3,2 to fit on a 10 unit disk. After (5,4) nothing will fit, but (5,3,2) is better. That is the reason, why I try to check all solutions.

      And it came to pass that in time the Great God Om spake unto Brutha, the Chosen One: "Psst!"
      (Terry Pratchett, Small Gods)

Re: Distribute MP3 recordings optimally to CD-Rs
by Aristotle (Chancellor) on Oct 08, 2005 at 20:09 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://496388]
Approved by ghenry
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2024-04-18 23:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found