in reply to My brain hurts! Recursive directory algorithm..

It looks like you are splitting up files to put on a fixed-size medium (DVD perhaps?). How important is it that your solution be optimal (i.e. minimizing the number of filesets?). If that is not an important constraint, you could build buckets on a first-come-first serve basis quite easily using File::Find, something like this (warning: untested code)
use File::Find; use File::stat; our @current_bucket; our $current_size = 0; our $BUCKET_SIZE = 4000000000; sub add_to_bucket { my $size = stat($_)->size(); if ($size + $current_size > $BUCKET_SIZE) { # reset bucket process_bucket(\@current_bucket); @current_bucket = (); $current_size = 0; } $current_size += $size; push @current_bucket, $File::Find::name; } sub process_bucket { my $bucket = shift; # here do things like compress the list to contain # parent directories, etc, if you really need to do this # then print out the list. } find(\&add_to_bucket, "/data");
That should help you structure your problem....

If it is important that your solution be optimal, be warned that it is a hard algorithmic problem (its a form of the partitioning problem). You could still use File::Find to get the files, but the bucket forming and processing would need to be much more complex (and probably not worth it, though not knowing your precise needs I cannot say for sure...).
Best of luck..

--JAS

Replies are listed 'Best First'.
Re: Re: My brain hurts! Recursive directory algorithm..
by waswas-fng (Curate) on Jul 25, 2002 at 19:36 UTC
    Thanks so much, I see the light now. This logic along with Mike's (RMGir) Algorithm::Bucketizer insight will get me all the way home.

    I plan on building the total size for each of the "project_1" trees and pushing them into Algorithm::Bucketizer if they are < 4.2 gig. If not I will jump down one tree level from "project_1" and use a method similar to the one you show to create a bucket_like object that I can then insert into Algorithm::Bucketizer until I am out of data in the sub tree.


    After all that, Algorithm::Bucketizer can optimize the distribution over the buckets with:
    $b->optimize(algorithm => "brute_force");


    With a smaller number if items in the buckets brute forcing the "Knapsack Problem 0-1" should not be too time consuming (thank god it is not fractional =).

    I think this will cover all of my requirements.


    Thanks so much JAS and Mike for pointing me in the right direction! =)

    -Waswas