Glad to have helped.

You may realise, but it's worth stating in case you don't, the algorithm I used is in no way guaranteed to find the optimum solution. Only a very good one very quickly.

The problem with finding a more optimum solution is that there is no discernible pattern in the relationship nor ordering of the averages generated from even the order subsets of the set.

That last statement is best illustrated by calculating all the averages of all the possible subsets of a small set:

c:\test>858276-2 { 1 => 1, "1 2" => "1.5", "1 2 3" => 2, "1 2 3 4" => "2.5", "1 2 3 4 5" => 3, "1 2 3 5" => "2.75", "1 2 4" => "2.33333333333333", "1 2 4 5" => 3, "1 2 5" => "2.66666666666667", "1 3" => 2, "1 3 4" => "2.66666666666667", "1 3 4 5" => "3.25", "1 3 5" => 3, "1 4" => "2.5", "1 4 5" => "3.33333333333333", "1 5" => 3, 2 => 2, "2 3" => "2.5", "2 3 4" => 3, "2 3 4 5" => "3.5", "2 3 5" => "3.33333333333333", "2 4" => 3, "2 4 5" => "3.66666666666667", "2 5" => "3.5", 3 => 3, "3 4" => "3.5", "3 4 5" => 4, "3 5" => 4, 4 => 4, "4 5" => "4.5", 5 => 5, }

This lack of a pattern makes it impossible to come up with heuristics that are guaranteed to converge toward an optimal solution.

And given that for a set of 100,000 items, there are 9.99×1030102(*) possible subsets to consider, without some way to compare two partial solutions and rate one "better" than the other as a basis for further exploration, there is no way to construct even a Genetic Algorithm, (which are often useful in NP complete problems), for this problem.

Whilst it will be possible to construct heuristics that will sometimes give better results than my algorithm for particular datasets; I don't believe it is possible to come up with one that will guarantee to produce a better results for all datasets. Even finding the occasional improvements is likely to take 2 or 3 magnitudes greater resources.

(*)A number that is so enormous that few people will be able to comprehend just how big it truly is. It is so big, that it defies any attempts to try and "humanise" it. For example:

If the 10 billion stars in the Universe each had 10 earth-like planets; and each planet had 10 billion people; and each person had Google's resources, (say) 1 million cpus each; and each cpu had been testing one subset per nanosecond since the birth of the universe; then it would still require 1030050 more 'lives of the universe' to complete the problem.

It doesn't really help much does it :)


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

In reply to Re^3: Creating a sub-list constrained by an average while maintaining maximum number of original elements by BrowserUk
in thread Creating a sub-list constrained by an average while maintaining maximum number of original elements by Jack B. Nymbol

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.