I have been struggling with this for a few days now. Every time I think I have a workable algorithm for this I end up seeing a huge flaw. It screams recursive but I just can't seem to figure one out.
Where in subdirectories there is unstructured data in files that are in unstructured subdirectories under the "project_1" type level.
Where the largest file size is < 2 gb.
Where the total file size for a "project_1" like dir can be > 10 gig or < 50k
Where the "project_1" like field do not have to be named unique, but contain unique data and subdirectory structure.
Here is the problem:
Get a unique lists of files that when added up the total size is > 3.5 gig and less than 4.2 gig while keeping all of a "project_1" together in the same list of filenames unless it is >4.2 gig - at which point it should be broken up into as few lists as possible.
The lists returned would be
1:
all of the files in
/data/client1/project_type_1/project_1
/data/client1/project_type_1/project_2
/data/client2/project_type_1/project_5
/data/client2/project_type_2/project_6