Given the sets:
ABCD
ABCE
ABD
####
ABCD
ABE
ABC
EF
Using the "store everything in the powerset approach" (or, at least allocating space for it), we need to store 2^6 elements (A,B,C,D,E,F) or 64 bytes of data. We're going to assume that we know in advance that there are only 6 possible elements.
For sake of argument, let's assume a more compact efficient approach along the lines of the original approaches using hash tables. We'd build up a structure like this, in some arbitrary order:
#first set
ABCD
BCD
ACD
ABD
ABC
AB
AC
BC
AD
BD
CD
A
B
C
D
#second set
ABE
AE
BE
E
#third set
#generates nothing
#fourth set
EF
F
This only requires 22 bytes of data, a 65% memory savings.
##
##
#first set generates normally
ABCD #ABCD is the first set, store internally
BCD
ACD
ABD
ABC
AB
AC
BC
AD
BD
CD
A
B
C
D
#second set
is ABE a subset of ABCD?
NO -> print #ABE is the first set, store internally
is AB a subset of ABCD?
YES -> skip
is AE a subset of ABCD?
NO -> print
is BE a subset of ABCD?
NO -> print
is A a subset of ABCD?
YES -> skip
is B a subset of ABCD?
YES -> skip
is E a subset of ABCD?
YES -> skip
#third set
is ABC a subset of ABCD?
YES -> skip
#generates nothing, stores nothing
#fourth set
is EF a subset of ABCD?
NO -> is EF a subset of ABE?
NO -> print
is E a subset of ABCD?
NO -> is EF a subset of ABE?
YES -> skip
is F a subset of ABCD?
NO -> is F a subset of ABE?
NO -> print
internally, we only need to store ABCD, ABE, and EF for 3 bytes of data stored - a 95.3% storage savings.
##
##
So if we added on ABF to the list, it would check as follows:
is ABF a subset of EF?
NO -> ABE < ABF, therefore ABF cannot be a subset of anything. Stop now and print ABE
##
##
So instead of:
my @previous = qw(EF ABE ABCD)
I'd have:
my @previous = (
[qw(ABE ABCD)], #A slot
[qw(ABE ABCD)], #B slot
[qw(ABC)], #C slot
[qw(ABCD)], #D slot
[qw(EF ABE)], #E slot
[qw(EF)] #F slot
);
##
##
[qw(ABE ABCD)], #A slot
[qw(ABE ABCD)], #B slot
[qw(EF)] #F slot
##
##
(total number of sets) * (average length of sets) bytes of data.
Let's say we have 10,000,000 sets with an average set length of 24. That's only going to require 240,000,000 bytes to store (240 megabytes), which is slightly less than 1% of the total ram of the "store it all" approach.
##
##
To demonstrate the optimized version:
#first set generates normally
ABCD #ABCD is the first set, store internally
BCD
ACD
ABD
ABC
AB
AC
BC
AD
BD
CD
A
B
C
D
#second set ABE -> look at slots A, B, E. All are the same, so arbitrarily check against "A" (ABCD)
is ABE a subset of ABCD?
NO -> print #ABE is the first set, store internally
is AB a subset of ABCD?
YES -> skip
is AE a subset of ABCD?
NO -> print
is BE a subset of ABCD?
NO -> print
is A a subset of ABCD?
YES -> skip
is B a subset of ABCD?
YES -> skip
is E a subset of ABCD?
YES -> skip
#third set
is ABC a subset of ABCD?
YES -> skip
#generates nothing, stores nothing
#fourth set -> look at slots "E", "F". "F" is the smallest (empty), so we need to do -no- checks.
EF
E
F
We store 9 bytes and do 10 checks (determining "no checks" counts as a check). Note - the original
powerset version would also require 10 checks, so we're optimal here.