The standard NP-Complete routine I use (when lacking a problem or domain specific optimization) is a recursive solution that tries to cover all combinatorial possibilities, but will short circuit (prune the tree) in the case where the fitness function would never succeed from that point on.
I have something similar with a variation of fitness() except that the function does all the heavy-lifting of driving the moves:
Thanks for all the suggestions so far - I'm investigating them all.
I'm looking for an error of less than 0.001 from the actual average for all buckets (unless the bucket has < 3 or 4 allocations) and the timeframe is less than 3 seconds on a new linux blade.