Using a fairly highly tuned C implementation of a Sieve of Atkin prime generator, finding the 15 millionth prime takes just over one second. Using my load and lookup Perl takes just under to do the same thing.
I think it doubtful that a pure perl implementation could achieve anywhere near this performance, as the C implementation is using hand-coded optimisations for 32-bit math--unwinding loops; using lookups instead of division; tailoring to L1 cache sizes etc. But you might prove me wrong :)
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
Yes - primegen is very impressive .... and so is your technique for storing and retrieving them :-)
But how does that relate to the statement I was addressing - namely "Considering that lists of prime numbers are easily available, I'm not so sure whether a sieve beats reading them from disk" ? To me, it's fairly apparent that Anonymous Monk was thinking of getting a list of prime numbers, putting them in a flat text file, and reading that file into memory whenever the list of primes is needed. I was simply clarifying for him that the sieve is faster.
I guess it's possible that I'm selling Anonymous Monk short .... perhaps the storage/retrieval technique that he envisaged was more sophisticated and efficient than I gave him credit for - something like yours, for example. (If so ... my apologies to Anonymous Monk.)
Cheers, Rob
| [reply] |
To me, it's fairly apparent that Anonymous Monk was thinking of getting a list of prime numbers, putting them in a flat text file, and reading that file into memory whenever the list of primes is needed.
How I store the primes would depend on what queries one would want to perform. For instance, if I wanted to answer queries "give me the Nth prime", I'd store the primes as k bytes each (k depends on the number of primes I store), the Nth prime is then stored on in the k bytes starting from offset (k-1)*N. That would, after opening the file, require a single read().
If I want to know whether a certain number is prime (or to give me a list of all primes between two numbers), I'd spend less than half a bit on each number. For instance, I might spend a byte to encode the primeness of a sequence of 30 numbers (a number N can only be prime if N % 30 is one of 1, 7, 11, 13, 17, 19, 23 or 29 (the exceptions are 2, 3 and 5)). So, I only need to store the primeness of all numbers of the form 30k+1, 30k+7, 30k+11, 30k+13, 30k+17, 30k+19, 30k+23 and 30k+29. Ergo, to determine whether a number N is prime, use the following steps:
- If N equals 2, 3 or 5, N is prime.
- If N mod 30 isn't 1, 7, 11, 13, 17, 19, 23 or 29, the number isn't prime.
- Otherwise, look up the corresponding bit in the floor(N/30)th byte to see whether this number is prime.
This algorithm also needs a single read(). To get a list, just read the appropriate bytes and look at the individual bits. Using this encoding, 1 Gb of disk space (which in modern computer can easily be read into memory), I can encode the primeness of all numbers up to 32,212,254,720.
There are more compact encodings of course.
| [reply] |
it's fairly apparent that Anonymous Monk was thinking of getting a list of prime numbers, putting them in a flat text file, and reading that file into memory whenever the list of primes is needed.
Actually, that can be even faster than my lookup version. Provided that you store them in a fixed record format file--binary or ascii--then getting the Nth prime is just a seek and small read away.
The following uses a simple flat ascii file with the primes padded to 9 bytes (+CRLF). Lookup times for individual primes are consistantly under half a millisecond:
C:\test>primes-a 2
3
Took: 0.000457048416137695 seconds
C:\test>primes-a 15e6
275604541
Took: 0.000396966934204102 seconds
C:\test>primes-a 12345678
224284387
Took: 0.000416040420532227 seconds
Teh downside is the size of the datafile:
C:\test>dir data\primes.all
2006-02-04 15:50 165,000,000 primes.all
The program:
#! perl -slw
use strict;
use Time::HiRes qw[time];;
my $start = time;
open P, '<:raw', 'data\primes.all';
seek P, 11 * ( $ARGV[ 0 ] - 1 ), 0;
print scalar <P>;
close P;
print "Took: ", time() - $start, ' seconds';;
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |