in reply to Testing if an array contains a value and then deleting it in most efficient way

Deleting from the middle of an array is inherently slow, since all the subsequent elements need to be shifted. Locating elements is slow too.

Is the data sorted by the search criteria?
If so, the item can be located using a binary search which is much faster than a linear search.

Can you use a hash indexed by the search criteria.
Locating and deleting the element would become really cheap.

If you're stuck with an array, you might want to consider undefing the element instead of deleting it. Just skip the undefs when it's time to save the structure.

  • Comment on Re: Testing if an array contains a value and then deleting it in most efficient way

Replies are listed 'Best First'.
Re^2: Testing if an array contains a value and then deleting it in most efficient way
by parv (Parson) on Feb 18, 2008 at 16:21 UTC

    In code (while using array) ...

    use List::MoreUtils qw/ firstidx /; for my $val ( generate() ) { my $i; # Deal w/ "uninitialized variable" warning as appropriate. while ( -1 < ( $i = firstidx { $_ == $val } @array ) ) { delete $array[ $i ]; } } # Later... # exists() here would go better with earlier delete(), but # then would need to generate list of indices. my @save = grep { defined $_ } @array;

      ow! You transformed a O(N) problem into an O(N2) solution.
      firstidx starts at the start of the array every loop pass.

      Not only did you half the speed, you doubled the memory requirements.
      "firstidx" copies the entire array before processing it.

        How can you avoid O(N^2) cost if using unsorted array? So far there is no indication whether it would be sorted. (Addendum: I wasn't thinking so just quoted the O(N^2) cost, which really should have been O(N). Time to sleep belatedly, I suppose.)

        I thought that List::MoreUtils::firstidx would use XS magic (to avoid copying). No? (I would have looked inside the C code myself but am not familiar with XS yet.)

        Later ... I see now that array might be already sorted (and first element would be the interesting one).

Re^2: Testing if an array contains a value and then deleting it in most efficient way
by karpatov (Beadle) on Feb 18, 2008 at 16:30 UTC
    In fact the values to be looked-up will come numerically sorted (probably if the record IDs in db are really ordered by their numeric ID), so sorting the values in the array would mean that only first element is deleted.
    So it seems that I can only ask whether the value to be looked up equals just the first item of the array! How only could miss it? Tx for pointing the possibility of ordering.
    Undefs are ignored during the search, so their presence doesnt slowdown the procedure?
      Fortunately for you, deleting the first (or last) element of an array (shift(@a)) is very fast (O(1)) in Perl.
        There is one problem though, the values to be check against the array come from a db which is splitted into several files, so their names something0001,something1000 should be ordered but @files = sort { $a <=> $b } @files causes the IDE to freeze. What can be wrong? tx.

      A side note ... The records may be sorted but will not be returned as such when not asked, if database server is Sybase 15.0.x.

        Even more of a side note - one should never rely on the inherent ordering of data in a SQL database. Any ordering is a side effect of the storage mechanism, and can change (for example due to parallel query engines).

        Always use an ORDER BY clause if the order of the returned data is significant in any way!

        Michael

        PS - for Sybase it's not only ASE 15.0.x that has this behavior - it's any table that has row-level locking (DOL), or if you have partitioned tables, or if you use parallel queries.