I don't know how it scales though yet.
Projecting from:
[ 0:53:04.64] c:\test>845979 1000
Generate fake paragraphs
Sort
Count reversed hash
Solving
999/1000
[0 2 3 4 7 8 9 10 11 12 13 1
[4 6 7 8 10 11 12 13 14 15 1
[0 1 3 4 6 7 8 10 11 12 13 1
[0 5 6 7 8 9 10 12 14 15 18
11s
[ 1:11:24.79] c:\test>845979 2000
Generate fake paragraphs
Sort
Count reversed hash
Solving
1999/2000
[0 1 4 5 6 7 10 12 13 14 15 16 20
[1 3 5 6 7 9 11 13 14 16 17 18 19
[1 3 4 5 6 7 8 9 10 11 13 14 15 1
[1 2 6 7 8 9 10 12 14 19 20 21 22
50s
[ 1:12:25.86] c:\test>845979 4000
Generate fake paragraphs
Sort
Count reversed hash
Solving
3999/4000
[0 1 5 9 10 11 13 14 15 16 18 19 2
[0 1 4 8 9 11 13 14 15 16 18 19 20
[1 4 5 7 8 9 10 11 12 15 16 17 18
[2 3 4 5 6 7 8 9 11 13 17 18 19 21
220s
Each doubling of the number of paras, multiplies the time taken by 4.4. If we double 1000 9 times we get 512,000 (close enough). So, 11 * 4.4^9 gives: 6799340 seconds == 79 days.
Though you will run out of memory long before then on commodity hardware.
Using a testcase generator along the lines of yours, my code gives:
Note Like yours, I don't time the discovery phase. Just the solution phase.
[ 1:31:22.89]C:\test>845818 -W=200 -P=1000 > nul
Words: 200
Paras: 1000
Took 0.000000 seconds
All the words are covered by 9 paras:
[ 1:31:38.04]C:\test>845818 -W=200 -P=1000 > nul
Words: 200
Paras: 1000
Took 0.008884907 seconds
All the words are covered by 10 paras:
[ 1:32:07.65]C:\test>845818 -W=200 -P=2000 > nul
Words: 200
Paras: 2000
Took 0.020982027 seconds
All the words are covered by 9 paras:
[ 1:32:30.77]C:\test>845818 -W=200 -P=4000 > nul
Words: 200
Paras: 4000
Took 0.046000004 seconds
All the words are covered by 10 paras:
[ 1:32:53.62]C:\test>845818 -W=200 -P=8000 > nul
Words: 200
Paras: 8000
Took 0.248465061 seconds
All the words are covered by 10 paras:
[ 1:33:32.98]C:\test>845818 -W=200 -P=16000 > nul
Words: 200
Paras: 16000
Took 1.472499847 seconds
All the words are covered by 10 paras:
[ 1:34:44.75]C:\test>845818 -W=200 -P=32000 > nul
Words: 200
Paras: 32000
Took 6.217499971 seconds
All the words are covered by 10 paras:
[ 1:36:55.87]C:\test>845818 -W=200 -P=64000 > nul
Words: 200
Paras: 64000
Took 25.427500010 seconds
All the words are covered by 9 paras:
It requires 4 more doublings to get from 64000 to 512,000. And 25.4 / 6.2 = 4.0. So starting with 25.4 * 4.0^4 = 6502.4 = 1.806 hours.
To confirm the projection, I'll leave mine running on a 512,000 paras run while I sleep and post the results tomorrow.
Update: It completed in 27 minutes and used apeek of 500MB:: [ 1:45:29.21]C:\test>845818 -W=200 -P=512000 > nul
Words: 200
Paras: 512000
Took 1657.894999981 seconds
All the words are covered by 11 paras:
The code incorporating the testcase generator: #! perl -slw
use strict;
use Time::HiRes qw[ time ];
sub uniq{ my %x; @x{@_} = (); keys %x }
$|++;
our $W //= 200;
our $P //= 1000;
my @idx = ( uniq( map int( rand 60000 ), 1 .. 2*$W ) )[ 0 .. $W-1 ];
print STDERR "Words: ", ~~@idx;
my %paras;
for my $p ( 1 .. $P ) {
local $_ = join ' ', uniq map $idx[ rand @idx ], 0 .. rand( $W/2 )
+;
$paras{ $p } //= '';
for my $i ( 0 .. $#idx ) {
my $word = $idx[ $i ];
if( m[\b\Q$word]i ) {
vec( $paras{ $p }, $i, 1 ) = 1;
}
}
}
print STDERR "Paras: ", scalar keys %paras;
my $start = time;
my @parasByWordCount = map{
unpack 'x[N]A*', $_
} sort map {
pack 'NA*', unpack( '%32b*', $paras{ $_ } ), $_
} keys %paras;
my @set = shift @parasByWordCount;
my $mask = $paras{ $set[ 0 ] };
while( @parasByWordCount ) {
my $next = shift @parasByWordCount;
if( ( $mask | $paras{ $next } ) ne $mask ) {
## $Next contains new paras, so add to set
push @set, $next;
$mask |= $paras{ $next }
}
## otherwise just discard it
}
printf STDERR "Took %.9f seconds\n", time() - $start;
printf STDERR "All the words are covered by %d paras:\n", scalar @set
+;
printf "%3d : %s\n", $_, unpack 'b*', $paras{ $_ } for sort{ $a<=>$b }
+ @set;
printf "All : %s\n", unpack 'b*', $mask;
__END__
[ 5:17:45.68]C:\test>845818 -W=200 -P=600000 > nul
Words: 200
Paras: 600000
Took 71.555999994 seconds
All the words are covered by 200 paras:
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|