in reply to Sparse arrays?

Explanation for observed behaviour:

  1. Perl arrays take the form of a C pointer to a pointer array.
  2. Each element in the pointer array refers to a single scalar value, or the value NULL.
  3. Arrays are automatically extended during assignment when they are indexed with an index that does not exist.
  4. Array extension initializes the newly created elements to the value NULL.
  5. From Perl, array elements with the value NULL appear to be the undef() value.
  6. The undef() value does not have any qualifiers, and is reused. All simple undef() values referenced from Perl can refer to the same scalar data structure internally, minimizing memory usage.
  7. The special scalar data structure used to indicate the undef() value is marked as an 'immortal scalar'.
  8. Immortal scalars are given artificially large reference counts to minimize the code path that normally decrements the reference count, and if the reference count reaches zero, deallocates the scalar. Immortals live even if the reference count reaches zero, so by setting the reference count to a very large value each time it reaches zero, the reference decrement code is simplified on the critical path.

Summary:

Perl supports sparse arrays of a sort. Arrays that are extended, but not explicitly initialized do not require nearly as much memory as arrays that are extended and explicitly initialized. On an architecture with 32-bit C pointers, an array of length 1000000, that has not been explicitly initialized, will require only 4000000 bytes of RAM.