|The stupid question is the question not asked|
Recamán's sequence and memory usageby Athanasius (Archbishop)
|on Jul 13, 2015 at 08:41 UTC||Need Help??|
I was looking at The On-Line Encyclopedia of Integer Sequences (OEIS) (you know, as one does), and on the Welcome page I found a list of Some Famous Sequences, of which the first is Recamán’s sequence, defined as follows:
What makes this sequence interesting is N. J. A. Sloane’s conjecture that every number eventually appears.
Coding the sequence is simplicity itself; the challenge is to test Sloane’s conjecture by keeping track of the numbers that have not yet appeared in the series. My initial, naïve approach was to use a sieve, à la Eratosthenes:
But this turned out to be far too memory-hungry: for values of MAX of the order of twenty million, RAM usage on my 3GB system approaches 100%, thrashing sets in, and the script (along with the rest of Windows) grinds to a shuddering halt.
Surely, I thought, there must be a memory-efficient way to represent a sieve? And of course there is, and of course it was already implemented on CPAN. A little searching led to the Set::IntSpan module which stores runs of consecutive integers as spans, allowing large (even infinite) collections of integers to be represented very economically.
Calculation of successive terms in the Recamán sequence is noticeably slower using Set::IntSpan for lookup than it is using a hash. But, as the adage says, it’s better to be late than be dead on time. (This was the slogan of an Australian safe driving ad campaign some years ago.) For the record: I also looked at Set::IntSpan::Fast and Set::IntSpan::Fast::XS. The latter failed to install on my system, and the former actually ran slower than Set::IntSpan for this use-case.
Turns out that Set::IntSpan not only solves the memory problem, it also makes it possible to dispense with an upper bound for the sieve. How, then, to display progressive results? Well, the OEIS has a couple of additional series related to Recamán’s:
So I recast the script to output successive values of these two series:
Here is the new script:
This script was developed under Windows 8.1, 64-bit, using Strawberry Perl:
The two signal handlers allow the script to be interrupted as follows:
My takeaways from this meditation?
First, we all know that micro-optimisation is pointless until you have first selected the best algorithm(s). But optimising an algorithm may actually consist in optimising its underlying data structures. Obvious? Yes, but still worth a reminder now and then.
Second, CPAN is awesome! But you knew that already. :-)