in reply to Efficient array element deletion

Every time push is forced to allocate more memory, it needs to copy the entire array. This can be avoided by preallocating enough memory.

my $count = @array; $#array = $count*2 - 1; for (1 .. $count) { push @array, $value if ($value = shift @array) !~ /^\#/ }

In terms of scalability,

Replies are listed 'Best First'.
Re^2: Efficient array element deletion
by kennethk (Abbot) on Dec 04, 2008 at 23:39 UTC

    From Shift, Pop, Unshift and Push with Impunity!:

    One consequence of perl's list implementation is that queues implemented using perl lists end up "creeping forward" through the preallocated array space leading to reallocations even though the queue itself may never contain many elements. In comparison, a stack implemented with a perl list will only require reallocations as the list grows larger. However, perl is smartly coded because the use of lists as queues was anticipated. Consequently, these queue-type reallocations have a negligible impact on performance. In benchmarked tests, queue access of a list (using repeated push/shift operations) is nearly as fast as stack access to a list (using repeated push/pop operations).

    I read this to mean that while naive implementation would have yielded O(N2), perl is smart enough that the exponent drops (closer) to O(N). Is this incorrect?

    Also, it seems like O(N2) on splice is a worst case, where best case (either all or no deletions) would be O(N), leading me to think it'd be closer to O(N log N) in practice.

    The crux of my question though was supposed to be about the constant in front of the memory term, particularly as all scale equivalently in memory.

      Also, it seems like O(N2) on splice is a worst case, where best case (either all or no deletions) would be O(N), leading me to think it'd be closer to O(N log N) in practice.

      I tried all N=16 inputs:

      0 elements were shifted 1 times 16 elements were shifted 16 times 31 elements were shifted 120 times 45 elements were shifted 560 times 58 elements were shifted 1820 times 70 elements were shifted 4368 times 81 elements were shifted 8008 times 91 elements were shifted 11440 times 100 elements were shifted 12870 times 108 elements were shifted 11440 times 115 elements were shifted 8008 times 121 elements were shifted 4368 times 126 elements were shifted 1820 times 130 elements were shifted 560 times 133 elements were shifted 120 times 135 elements were shifted 16 times 136 elements were shifted 1 times 98 elements where shifted on average

      The average result is 98, which is about twice O(N log N). So,
      Average case
      = O({loop body cost}*N + {element shift cost}*N log N)
      = O(N + N log N)
      = O(N log N)

      The thing is, the worst case is also in the same order, so
      Worse case
      = O(N log N)

      I accept your better average case, and I propose a better worst case than we both thought.

      I read this to mean that while naive implementation would have yielded O(N2), perl is smart enough that the exponent drops (closer) to O(N). Is this incorrect?

      A naïve implementation of push would take O(N) for every element pushed. Currently, it takes O(1) for most pushes, and O(N) on occasion.

      @a = qw( a b c ); +---+---+---+---+ | a | b | c | / | / = allocated, but unused. +---+---+---+---+ push @a, 'd'; +---+---+---+---+ | a | b | c | d | +---+---+---+---+ push @a, 'e'; +---+---+---+---+---+---+---+---+---+---+---+---+ | a | b | c | d | e | / | / | / | / | / | / | / | +---+---+---+---+---+---+---+---+---+---+---+---+

      It only preallocates so much. As soon as the preallocated memory is used up, a new memory block is alocated. the whole array must be copied. The shift-push solution is therefore O(N * N*{chance of reallocation needed}) which probably ressembles worse/average case O(N log N).

      So I that makes the scalability as follows:

      • The grep solution you provided uses O(N) time and O(N) memory.
      • The splice solution you provided uses O(N log N) time and O(1) memory.
      • The shift-push solution you provided uses O(N log N) time and O(N) memory.
      • The shift-push solution I provided uses O(N) time and O(N) memory.

      The crux of my question though was supposed to be about the constant in front of the memory term, particularly as all scale equivalently in memory.

      I thought you were more interested in speed, sorry.

      • splice is done in-place. (Assuming you get rid of the reverse!!)
      • grep probably uses N SV* extra memory. It could possibly be done in place.
      • My shift-push uses N SV* extra memory.
      • Your shift-push uses between N and 5*N SV* (peak), and between N and 3*N SV* (final) extra memory.

      Pushing slightly more than doubles the allocated memory when a reallocation is forced. If N' is the number of elements kept, 3*N is 2*(N+N') when N'=N, minus the initial memory. The peak occurs when copying the pointers from the old memory block to the new memory block.