To me, it's fairly apparent that Anonymous Monk was thinking of getting a list of prime numbers, putting them in a flat text file, and reading that file into memory whenever the list of primes is needed.
How I store the primes would depend on what queries one would want to perform. For instance, if I wanted to answer queries "give me the Nth prime", I'd store the primes as k bytes each (k depends on the number of primes I store), the Nth prime is then stored on in the k bytes starting from offset (k-1)*N. That would, after opening the file, require a single read().
If I want to know whether a certain number is prime (or to give me a list of all primes between two numbers), I'd spend less than half a bit on each number. For instance, I might spend a byte to encode the primeness of a sequence of 30 numbers (a number N can only be prime if N % 30 is one of 1, 7, 11, 13, 17, 19, 23 or 29 (the exceptions are 2, 3 and 5)). So, I only need to store the primeness of all numbers of the form 30k+1, 30k+7, 30k+11, 30k+13, 30k+17, 30k+19, 30k+23 and 30k+29. Ergo, to determine whether a number N is prime, use the following steps:
- If N equals 2, 3 or 5, N is prime.
- If N mod 30 isn't 1, 7, 11, 13, 17, 19, 23 or 29, the number isn't prime.
- Otherwise, look up the corresponding bit in the floor(N/30)th byte to see whether this number is prime.
This algorithm also needs a single read(). To get a list, just read the appropriate bytes and look at the individual bits. Using this encoding, 1 Gb of disk space (which in modern computer can easily be read into memory), I can encode the primeness of all numbers up to 32,212,254,720.
There are more compact encodings of course. | [reply] |
it's fairly apparent that Anonymous Monk was thinking of getting a list of prime numbers, putting them in a flat text file, and reading that file into memory whenever the list of primes is needed.
Actually, that can be even faster than my lookup version. Provided that you store them in a fixed record format file--binary or ascii--then getting the Nth prime is just a seek and small read away.
The following uses a simple flat ascii file with the primes padded to 9 bytes (+CRLF). Lookup times for individual primes are consistantly under half a millisecond:
C:\test>primes-a 2
3
Took: 0.000457048416137695 seconds
C:\test>primes-a 15e6
275604541
Took: 0.000396966934204102 seconds
C:\test>primes-a 12345678
224284387
Took: 0.000416040420532227 seconds
Teh downside is the size of the datafile:
C:\test>dir data\primes.all
2006-02-04 15:50 165,000,000 primes.all
The program:
#! perl -slw
use strict;
use Time::HiRes qw[time];;
my $start = time;
open P, '<:raw', 'data\primes.all';
seek P, 11 * ( $ARGV[ 0 ] - 1 ), 0;
print scalar <P>;
close P;
print "Took: ", time() - $start, ' seconds';;
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |