in reply to Re (tilly) 1: Efficiency Question
in thread Efficiency Question

That's a really interesting module; never seen it before.

The reason I'm doing this by hand is to reduce overhead, both in execution and loading. All the prefabbed modules that are designed to do something like this are simply too complex for my needs. Which means writing something that is very specific to the parent.

I'm just looking for the simpliest, most efficient solution to my question.

This is more of an infatuation that an actual need.


Let me put it this way: without using any modules, what is the fastest and smallest caching system you can come up with that stores and recalls blocks of data indexed by a Unique ID. The data is passed as a scalar. The cache must have a maximum size of 30 elements and be able to order itself according to hit frequency.

Replies are listed 'Best First'.
Re (tilly) 3: Efficiency Question
by tilly (Archbishop) on Feb 23, 2001 at 23:58 UTC
    Fastest and smallest are incompatible.

    Also do you need at least 30, or exactly?

    If I wanted to be bare bones I would do something like this untested code:

    use strict; use vars qw($max_size $reset_size); $max_size = 60; $reset_size = 30; { my %count; my %cache; sub get_elem { my $id = shift; if (exists $count{$id}) { ++$count{$id}; return $cache{$id}; } else{ return; } } my $size; sub set_elem { { my $id = shift; my $elem = shift; if (exists $count{$id}) { $cache{$id} = $elem; } else { if ($max_size < ++$size) { _clear_cache(); } $count{$id} = 0; $cache{$id} = $elem; } redo if @_; } } sub _clear_cache { my @ids = sort {$count{$b} <=> $count{$a}} keys %count; @ids = @ids[0..($reset_size-1)]; my %new; %count = (); foreach (@ids) { $new{$_} = $cache{$_}; $count{$_} = 0; } %cache = %new; $size = $reset_size; } }
    Note that I go over 30 here, but doing so makes the actual lookup much faster.
      You can speed up the _clear_cache, which would in probably eat up the most cycles, being called once every 30 insertions, like so:
      sub clear_cache { my @ids = sort {$count{$b} <=> $count{$a}} keys %count; splice @ids, 0, $reset_size-1; delete @cache{@ids}; %count = (); @count{keys %cache} = (0) x $reset_size; $size = $reset_size; }
      Faster to delete elements from an existing hash, I've found, than to rebuild a hash. This of course depends mostly on the ratio of deleted/total hash entries.

      Also, this is isn't quite a frequency-expiration cache, since time is not a factor into the expiration routine, and you would stand a pretty good chance of getting stale cache entries that once had a bunch of hits, but remain in the cache even though they are entirely unused. I'd probably switch the count over to something like a shift-based MRU, called once every 30 fetches:

      my $do_shift = 0; sub get_elem { my $id = shift; if (++$do_shift == 30) { shift_mru(); $do_shift = 0; } if (exists $count{$id}) { $count{$id} |= 0x80000000; return $cache{$id}; } else{ return; } } sub shift_mru { while (my ($id, $count) = each %count) { $count{$id} >>= 1; } }
      I also seem to recall that there are ways to optimize the sorting algorithm for a return of fewer results than the total number to be sorted, but that's getting rather hairy :-)
         MeowChow                                   
                     s aamecha.s a..a\u$&owag.print