http://qs1969.pair.com?node_id=1134465

Esteemed Monks,

I was looking at The On-Line Encyclopedia of Integer Sequences (OEIS) (you know, as one does), and on the Welcome page I found a list of Some Famous Sequences, of which the first is Recamán’s sequence, defined as follows:

R(0) = 0; for n > 0, R(n) = R(n-1) - n if positive and not already in the sequence, oth +erwise R(n) = R(n-1) + n.

What makes this sequence interesting is N. J. A. Sloane’s conjecture that every number eventually appears.

Coding the sequence is simplicity itself; the challenge is to test Sloane’s conjecture by keeping track of the numbers that have not yet appeared in the series. My initial, naïve approach was to use a sieve, à la Eratosthenes:

use strict; use warnings; use constant MAX => 1e6; $| = 1; my %seq = (0 => 0); my $r0 = 0; for my $n (1 .. MAX) { my $r = $r0 - $n; $r = $r0 + $n if $r < 0 || exists $seq{$r}; $seq{$r} = $n; $r0 = $r; } for (0 .. MAX) { unless (exists $seq{$_}) { printf "For MAX = %.1e, the first missing number is %d\n", MAX +, $_; last; } }

But this turned out to be far too memory-hungry: for values of MAX of the order of twenty million, RAM usage on my 3GB system approaches 100%, thrashing sets in, and the script (along with the rest of Windows) grinds to a shuddering halt.

Surely, I thought, there must be a memory-efficient way to represent a sieve? And of course there is, and of course it was already implemented on CPAN. A little searching led to the Set::IntSpan module which stores runs of consecutive integers as spans, allowing large (even infinite) collections of integers to be represented very economically.

Calculation of successive terms in the Recamán sequence is noticeably slower using Set::IntSpan for lookup than it is using a hash. But, as the adage says, it’s better to be late than be dead on time. (This was the slogan of an Australian safe driving ad campaign some years ago.) For the record: I also looked at Set::IntSpan::Fast and Set::IntSpan::Fast::XS. The latter failed to install on my system, and the former actually ran slower than Set::IntSpan for this use-case.

Turns out that Set::IntSpan not only solves the memory problem, it also makes it possible to dispense with an upper bound for the sieve. How, then, to display progressive results? Well, the OEIS has a couple of additional series related to Recamán’s:

So I recast the script to output successive values of these two series:

14:20 >perl recaman.pl 1 <-- 1 2 <-- 4 4 <-- 131 19 <-- 99734 ...

Here is the new script:

use strict; use warnings; use sigtrap handler => \&int_handler, 'INT', handler => \&break_handler, 'BREAK'; use Set::IntSpan; use Time::HiRes qw(gettimeofday tv_interval); $| = 1; my $t0 = [gettimeofday]; my $min0 = 1; my $n = 0; my $r0 = 0; my $missing = Set::IntSpan->new( '1-)' ); print "$min0 <-- "; while (++$n) { my $r = $r0 - $n; $r = $r0 + $n if $r < 0 || !$missing->member($r); $missing->remove($r); if ((my $min1 = $missing->min) > $min0) { print "$n\n$min1 <-- "; $min0 = $min1; } $r0 = $r; } sub int_handler { printf "\nn = %d, elapsed time: %.1fs\n", $n, tv_interval($t0); } sub break_handler { int_handler(); exit 0; }

This script was developed under Windows 8.1, 64-bit, using Strawberry Perl:

14:20 >perl -v This is perl 5, version 22, subversion 0 (v5.22.0) built for MSWin32-x +64-multi-thread

The two signal handlers allow the script to be interrupted as follows:

My takeaways from this meditation?

First, we all know that micro-optimisation is pointless until you have first selected the best algorithm(s). But optimising an algorithm may actually consist in optimising its underlying data structures. Obvious? Yes, but still worth a reminder now and then.

Second, is awesome! But you knew that already. :-)

Cheers,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re: Recamán's sequence and memory usage
by hdb (Monsignor) on Jul 13, 2015 at 14:14 UTC

    Athanasius, thanks for your interesting write-up. Here is a little version using vec for storing which numbers have been seen already. It runs quite happily until it hits integer overflow... It only prints a line when the maximum observed or the minimum unobserved increases.

    use strict; use warnings; my $min = 2; # smallest unused number my $max = 3; # largest number already visited my $n = 2; my $r = 3; my $s; vec($s,$_,1)=1 for (0,1,3); my $flag = 1; while(++$n){ $r += ($r-$n>0) && not( vec($s,$r-$n,1) ) ? -$n : $n; vec($s,$r,1)=1; $min++, $flag=1 while vec($s,$min,1); $max = $r, $flag=1 if $r > $max; print "$n\t$max\t$min\t$r\t".length($s)."\n" if $flag; $flag = 0; }

      Hello hdb,

      Brilliant use of vec, producing fast and very clever code. Bravo!

      But — running your script on my system, I didn’t get anywhere near integer overflow before Windows effectively froze, with Task Manager showing memory at 97% and disk at 100%. (I didn’t time it, but it can’t have taken more than an hour or two to get to this point.) The console read:

      2162984298 13994685375 1355 13994685375 1749335672

      So then I ran my script to see if it would get that far. 14 hours and 37 minutes later (!) it had reached:

      n = 2177458553, elapsed time: 52649.9s

      with Task Manager showing memory at 60% and disk at 0%.

      :-)

      Cheers,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Recamán's sequence and memory usage
by QM (Parson) on Jul 13, 2015 at 15:58 UTC
    Fun!

    Just wondering how 1073 could be checked, that seems like an awfully big number?

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      Indeed!

      Date Terms Computed Smallest Missing Number Author
      6 Nov, 2001 1 × 1015 852,655 Allan Wilks
      13 Jun, 2006 1 × 1025 852,655 Benjamin Chaffin
      28 Mar, 2008 7.78 × 1037 852,655 Benjamin Chaffin
      22 Mar, 2010 4.28 × 1073 852,655 Benjamin Chaffin

      I don’t know how it’s done, but it’s certainly a slow process! :-O

      Cheers,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Recamán's sequence and memory usage
by anonymized user 468275 (Curate) on Jul 14, 2015 at 18:25 UTC
    potential optimisations:

    1) remove consecutive entries between a(0)=0 and m-1 where m is the first missing value. This prevents the sieve from growing beyond resources to support it.

    2) use an array instead of a hash -- if the index is numeric, it's an array.

    3) calculate the rate of growth of the remaining sieve to test for convergence.

    One world, one people

      From my experiments so far with n up to 2e10:

      1. The first missing value is tiny compared to the maximum observed number, so not much can be saved here.
      2. The observed numbers only cover 1/8 of the whole range, so we are dealing with a sparsely populated array in your proposal. This might not be the best way to store the sequence.
      3. The ratio of the maximum number to the sequence number is around 8:1 so not much to be gained here either.

        In that case it looks like the range of the missing values is divergent. So although the conjecture looks false, it also looks tricky to disprove!

        One world, one people