Recamán's sequence and memory usage

Esteemed Monks,

I was looking at The On-Line Encyclopedia of Integer Sequences (OEIS) (you know, as one does), and on the Welcome page I found a list of Some Famous Sequences, of which the first is Recamán’s sequence, defined as follows:

R(0)  = 0;
for n > 0,
    R(n) = R(n-1) - n if positive and not already in the sequence, oth
+erwise
    R(n) = R(n-1) + n.
[download]

What makes this sequence interesting is N. J. A. Sloane’s conjecture that every number eventually appears.

Coding the sequence is simplicity itself; the challenge is to test Sloane’s conjecture by keeping track of the numbers that have not yet appeared in the series. My initial, naïve approach was to use a sieve, à la Eratosthenes:

use strict;
use warnings;
use constant MAX => 1e6;

$| = 1;

my %seq = (0 => 0);
my $r0  =  0;

for my $n (1 .. MAX)
{
    my $r = $r0 - $n;
       $r = $r0 + $n if $r < 0 || exists $seq{$r};

    $seq{$r} = $n;
    $r0      = $r;
}

for (0 .. MAX)
{
    unless (exists $seq{$_})
    {
        printf "For MAX = %.1e, the first missing number is %d\n", MAX
+, $_;
        last;
    }
}
[download]

But this turned out to be far too memory-hungry: for values of MAX of the order of twenty million, RAM usage on my 3GB system approaches 100%, thrashing sets in, and the script (along with the rest of Windows) grinds to a shuddering halt.

Surely, I thought, there must be a memory-efficient way to represent a sieve? And of course there is, and of course it was already implemented on CPAN. A little searching led to the Set::IntSpan module which stores runs of consecutive integers as spans, allowing large (even infinite) collections of integers to be represented very economically.

Calculation of successive terms in the Recamán sequence is noticeably slower using Set::IntSpan for lookup than it is using a hash. But, as the adage says, it’s better to be late than be dead on time. (This was the slogan of an Australian safe driving ad campaign some years ago.) For the record: I also looked at Set::IntSpan::Fast and Set::IntSpan::Fast::XS. The latter failed to install on my system, and the former actually ran slower than Set::IntSpan for this use-case.

Turns out that Set::IntSpan not only solves the memory problem, it also makes it possible to dispense with an upper bound for the sieve. How, then, to display progressive results? Well, the OEIS has a couple of additional series related to Recamán’s:

A064228: values of R(n) that take a record number of steps to appear: 1, 2, 4, 19, ...
A064227: the values of n corresponding to the values in A064228: 1, 4, 131, 99734, ...

So I recast the script to output successive values of these two series:

14:20 >perl recaman.pl
1 <-- 1
2 <-- 4
4 <-- 131
19 <-- 99734
...
[download]

Here is the new script:

use strict;
use warnings;
use sigtrap handler => \&int_handler,   'INT',
            handler => \&break_handler, 'BREAK';
use Set::IntSpan;
use Time::HiRes qw(gettimeofday tv_interval);

   $|       = 1;
my $t0      = [gettimeofday];
my $min0    = 1;
my $n       = 0;
my $r0      = 0;
my $missing = Set::IntSpan->new( '1-)' );

print "$min0 <-- ";

while (++$n)
{
    my $r = $r0 - $n;
       $r = $r0 + $n if $r < 0 || !$missing->member($r);

    $missing->remove($r);

    if ((my $min1 = $missing->min) > $min0)
    {
        print "$n\n$min1 <-- ";
        $min0 = $min1;
    }

    $r0 = $r;
}

sub int_handler
{
    printf "\nn = %d, elapsed time: %.1fs\n", $n, tv_interval($t0);
}

sub break_handler
{
    int_handler();
    exit 0;
}
[download]

This script was developed under Windows 8.1, 64-bit, using Strawberry Perl:

14:20 >perl -v

This is perl 5, version 22, subversion 0 (v5.22.0) built for MSWin32-x
+64-multi-thread
[download]

The two signal handlers allow the script to be interrupted as follows:

Control-C causes the script to display the current value of $n and the total running time of the script so far.
Control-Break causes the script to display the same information and then exit.

My takeaways from this meditation?

First, we all know that micro-optimisation is pointless until you have first selected the best algorithm(s). But optimising an algorithm may actually consist in optimising its underlying data structures. Obvious? Yes, but still worth a reminder now and then.

Second, is awesome! But you knew that already. :-)

Cheers,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

Comment on Recamán's sequence and memory usage Select or Download Code

Replies are listed 'Best First'.

Re: Recamán's sequence and memory usage
by hdb (Monsignor) on Jul 13, 2015 at 14:14 UTC

Athanasius, thanks for your interesting write-up. Here is a little version using vec for storing which numbers have been seen already. It runs quite happily until it hits integer overflow... It only prints a line when the maximum observed or the minimum unobserved increases.

use strict;
use warnings;

my $min = 2; # smallest unused number
my $max = 3; # largest number already visited
my $n   = 2;
my $r   = 3;
my $s;
vec($s,$_,1)=1 for (0,1,3);
my $flag = 1;
while(++$n){
    $r += ($r-$n>0) && not( vec($s,$r-$n,1) ) ? -$n : $n;
    vec($s,$r,1)=1;
    $min++, $flag=1 while vec($s,$min,1);
    $max = $r, $flag=1 if $r > $max;
    print "$n\t$max\t$min\t$r\t".length($s)."\n" if $flag;
    $flag = 0;
}
[download]

[reply]
[d/l]

Re^2: Recamán's sequence and memory usage

by Athanasius (Archbishop) on Jul 14, 2015 at 06:23 UTC

Hello hdb,

Brilliant use of vec, producing fast and very clever code. Bravo!

But — running your script on my system, I didn’t get anywhere near integer overflow before Windows effectively froze, with Task Manager showing memory at 97% and disk at 100%. (I didn’t time it, but it can’t have taken more than an hour or two to get to this point.) The console read:

2162984298      13994685375     1355    13994685375     1749335672
[download]

So then I ran my script to see if it would get that far. 14 hours and 37 minutes later (!) it had reached:

n = 2177458553, elapsed time: 52649.9s
[download]

with Task Manager showing memory at 60% and disk at 0%.

:-)

Cheers,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re: Recamán's sequence and memory usage
by QM (Parson) on Jul 13, 2015 at 15:58 UTC

Just wondering how 10⁷³ could be checked, that seems like an awfully big number?

-QM
--
Quantum Mechanics: The dreams stuff is made of

[reply]

Re^2: Recamán's sequence and memory usage

by Athanasius (Archbishop) on Jul 14, 2015 at 03:01 UTC

Indeed!

Date	Terms Computed	Smallest Missing Number	Author
6 Nov, 2001	1 × 10¹⁵	852,655	Allan Wilks
13 Jun, 2006	1 × 10²⁵	852,655	Benjamin Chaffin
28 Mar, 2008	7.78 × 10³⁷	852,655	Benjamin Chaffin
22 Mar, 2010	4.28 × 10⁷³	852,655	Benjamin Chaffin

I don’t know how it’s done, but it’s certainly a slow process! :-O

Cheers,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]

Re: Recamán's sequence and memory usage
by anonymized user 468275 (Curate) on Jul 14, 2015 at 18:25 UTC

1) remove consecutive entries between a(0)=0 and m-1 where m is the first missing value. This prevents the sieve from growing beyond resources to support it.

2) use an array instead of a hash -- if the index is numeric, it's an array.

3) calculate the rate of growth of the remaining sieve to test for convergence.

One world, one people

[reply]

Re^2: Recamán's sequence and memory usage

by hdb (Monsignor) on Jul 15, 2015 at 11:25 UTC

From my experiments so far with n up to 2e10:

The first missing value is tiny compared to the maximum observed number, so not much can be saved here.
The observed numbers only cover 1/8 of the whole range, so we are dealing with a sparsely populated array in your proposal. This might not be the best way to store the sequence.
The ratio of the maximum number to the sequence number is around 8:1 so not much to be gained here either.


Just another Perl shrine
	PerlMonks