I would guess that's a swapping problem (you might be able to verify this with
/usr/bin/time -l). I can think of several ways to cut down on memory: (1) shuffle an array of indices; (2) store the indices in a string, and access them using
vec; (3) use
PDL, which has compact numerical arrays.