in reply to Re^2: algorithm for 'best subsets'
in thread algorithm for 'best subsets'

Exactly. The test case should show that the functionality worked, and that different methods found the same subsets. It should not try to break your machine.

In practice, I can slice the %Items and the %Keywords up a bit, and do smaller overlapping datasets. I can also farm the work out to multiple machines on these slices.

More on what that is, tonight; I've been stuck in meetings today so haven't had the chance to really explain what all this is about.

--
[ e d @ h a l l e y . c c ]

Replies are listed 'Best First'.
Re^4: algorithm for 'best subsets'
by Limbic~Region (Chancellor) on Mar 03, 2005 at 21:27 UTC
    halley,
    For the record, my solution has a very small memory footprint even for really large test cases but it can be really slow to finish. A half hour with this test case for a target size of 3.

    Cheers - L~R

      As I think I've said elsewhere, Limbic~Region's solution has a time complexity that goes up as O(K!/(N!(K-N)!)), that is as "K choose N", where K is the number of keywords and N the size of your tuples. However, I think that the memory requirements of his solution are only O(K*I), where I is the number of items.

      Contrast my solution, which while it wins on this set of data (5 seconds for what takes L~R's 30 minutes), has a time complexity of at least O(I*L!/(N!(L-N)!)), and so bogs down if the size of the sets increases, (just change rand(8) to rand(80) and watch it fall over) and has a memory complexity of something like that as well. (where "L" is the average size of a set)

      -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
Re^4: algorithm for 'best subsets'
by BrowserUk (Patriarch) on Mar 03, 2005 at 22:04 UTC

    A clear indication of what output/format/datastructure you are expecting would be cool.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.