http://qs1969.pair.com?node_id=66693
Category: Miscellaneous
Author/Contact Info mr.nick
Description: A very simple, small, efficient (n) cache that maintains a limited set of items. Accessing an items moves it to the front of the cache which speeds location and refreshes it's age in the cache.
package cache;

use strict;
######################################################################
+########
sub new {
  my $class=shift;
  my $size=shift;
  my $self=bless {};
  
  $size=30 unless $size;

  $self->{array}=[];
  $self->{size}=$size;

  $self;
}

sub ins {
  my $self=shift;
  my $id=shift;
  my $data=shift;

  my $array=$self->{array};

  unshift @$array,[$id,$data];

  if ($#{$array} > $self->{size}) {
    pop @$array;
  }
}

sub get {
  my $self=shift;
  my $id=shift;

  my $array=$self->{array};
  my $data;

  for my $p (0..$#{$array}) {
    if ($array->[$p]->[0] eq $id) {
      $data=$array->[$p]->[1];
      splice(@$array,$p,1);
      unshift @$array,[$id,$data];
      last;
    }
  }

  return unless defined $data;
  $data;
}

sub del {
  my $self=shift;
  my $id=shift;

  my $array=$self->{array};
  my $data;

  for my $p (0..$#{$array}) {
    if ($array->[$p]->[0] eq $id) {
      splice(@$array,$p,1);
      last;
    }
  }
}

sub show {
  my $self=shift;
  my $array=$self->{array};
  local $_;

  for  (@$array) {
    print "id $_->[0] data $_->[1]\n";
  }
}
  

1;
Replies are listed 'Best First'.
Re (tilly) 1: A (kinda) Circular Cache
by tilly (Archbishop) on Mar 24, 2001 at 00:08 UTC
    Sorry, this is not as efficient as you advertise.

    Every time you scan or manipulate an array that is a O(n) operation. Arranging to constantly scan arrays quickly turns into O(n*n) which is slow algorithmically.

    I hinted at this to you in Re (tilly) 3: Efficiency Question. MeowChow and I talked more in private than you see at that node. The key concept here is that you want to use hash lookups (O(1) operation) as often as possible and then rebuild your cache (O(n) operation but with a poor constant) only when you have to. This is algorithmically much, much better than doing O(n) searches repeatedly.

    The general rule is that whenever you see "scan array" you should look for a hash.

    There are other stylistic notes. For instance you could write your new method as:

    sub new { my $self = bless { array => [] }, shift; $self->{size} = shift || 30; $self; }
    which by using the 2-argument form of bless will allow someone to inherit from your cache implementation if they want...
      I remember well the previous discussion about this, I'm afraid that I simply couldn't work out how to do it efficiently.

      I tried maintaining a seperate hash for accessing the data in addition to the array which maintained the order, but ran into difficulties keeping them in sync. I tried having the hash reference positions in the array, but that would require updating the hash when the array changed order (after a get or del or ins). I tried other methods, but found myself still walking the array to maintain it (dels & ins).

      Since I was walking the array anyhoot, I just left it where you see it now.

      What I don't understand is your comment Arranging to constantly scan arrays quickly turns into O(n*n) which is slow algorithmically.. In what way is my O(n) turned into O(n*n) ? Any given single operation of the cache only walks the array once (excluding the splice); isn't that the definition of O(n)? (I'm straining to remember my "big o" notation crap from a decade ago).


      I should note that I decided to use an array to maintain the order instead of something akin to $hash->{hits}++ and sorting/dropping by that. I presumed a
      for my $x (sort { $a->{hits} <=> $b->{hits} } @array) { ... }
      would be much more costly than a flat array.
        The classic algorithm mistake is to wind up walking through one array, scanning another for each item. If the two arrays are both of similar size n, this is a O(n*n) operation. There are many, many variations on this, but they all boil down to paying attention to whether you are scanning a list, then asking why.

        So you are right, your operations are O(n). But these are operations that you expect to do very often, and the result in real code will be an overall O(n*n) type operation. As I said in Re (tilly) 3: Sort this data, you should be leery of doing a loop within a loop if you can avoid it.

        Now you are right that it is hard to do a perfect LRU with a hash. If it matters that your cache always have the exact same size, then you are going to face constant resynchronization work. Now you can do it efficiently. It will require having a linked list, keeping track of both head and tail, and having a hash lookup into the list to avoid scanning it. With all of that you can do an algorithmically efficient true LRU. But note that while algorithmically it is scalable, in practice the operations are complex enough that it will be slow...

        However in the tradition of Close enough for government work! what is far less work - and still algorithmically good - is to say that maintaining a perfect LRU cache of fixed size is not that important. Instead let it grow, and then rebuild upon need. (Perl uses this type of periodic on demand allocation of resources in several places. Works well.) Once you accept that idea you can go for the simpler strategy of using a hash as a cache, and rebuilding the cache whenever it gets too big.

        This is what I did before. The resulting algorithm is O(1) the vast majority of times, but 1 time in n will be O(n). Amortize the cost of the one very expensive hit where you rebuild the hash over all of the times that you didn't do that, and the result averages out to O(1). (I sped up both map and unshift with patches like this.)

Re: A (kinda) Circular Cache
by arturo (Vicar) on Mar 23, 2001 at 22:45 UTC

    I like it!

    Meaningless 'optimization'/alternate style: I like to do my defaults right when I declare the variables; not that it matters much here, but just as a 'grace note':

    my $size = shift || 30;

    Also,

    return unless defined $data; $data;

    in your get method will return undef if $data is not defined (at least, if the sub's been called in a scalar context, which it will be, right?), so you could probably just get rid of that first line.

    well, I guess it saves a few keystrokes too ... heh.

    Philosophy can be made out of anything. Or less -- Jerry A. Fodor

      in your get method will return undef if $data is not defined (at least, if the sub's been called in a scalar context, which it will be, right?), so you could probably just get rid of that first line.

      Hehe, I'm terrified of that situation now. I spent probably 10 hours a couple of weeks ago tracking down a problem that was caused by scalar/array contexts and "return" vs. "return undef". I'm playing it safe from now on.

      Should have did the $size=shift || 30 thing myself though!

Re: A (kinda) Circular Cache
by merlyn (Sage) on Mar 23, 2001 at 23:06 UTC
ReTake!
by mr.nick (Chaplain) on Mar 26, 2001 at 20:32 UTC
    As I said to Tilly when asked why I didn't use the tried and true method of link lists, "Oh yeah, link lists. Perl's auto-hash made me forget them."

    So, in that vein, I present the Reinvent the Wheel Cache #2 (I believe it works under all cases). I plan to do some benchmarking later on ... (even though nobody will probably see this node)

    package cache2; use strict; ## okay, a second shot at this sub new { my $self=bless {},shift; ## we keep track of lots of things $self->{size}=shift || 30; $self->{count}=0; $self->{head}=undef; $self->{tail}=undef; $self->{hash}={}; $self; } sub ins { my $self=shift; my $key=shift; my $val=shift; ## put it at the head (we walk down -- next is further down the chai +n) ## create the hash ref my $new={ prev => undef, next => $self->{head}, key => $key, value => $val, }; ## tack it on $self->{head}=$new; ## and the hash for quick lookup $self->{hash}->{$key}=$new; $self->{count}++; ## do our size checking if ($self->{count}==$self->{size}+1) { $self->del($self->{tail}->{key}); } } sub del { my $self=shift; my $key=shift; my $item=$self->{hash}->{$key}; return unless defined $item; ## reattach it's prev/next $item->{prev}->{next}=$item->{next} if defined $item->{prev}; $item->{next}->{prev}=$item->{prev} if defined $item->{next}; ## head/tail handling $self->{head}=$item->{next} if $self->{head}==$item; $self->{tail}=$item->{prev} if $self->{tail}==$item; ## and the hash delete $self->{hash}->{$key}; ## and the count $self->{count}--; } sub get { my $self=shift; my $key=shift; my $item=$self->{hash}->{$key}; return unless defined $item; ## move it to the front by ## delete it $self->del($key); ## and add it $self->ins($key,$item->{value}); $item->{value}; } sub show { my $self=shift; my $item=$self->{head}; while (defined $item) { print "$item: "; print "key $item->{key} " if defined $item->{key}; print "value $item->{value} " if defined $item->{value}; print "\n"; $item=$item->{next}; } } 1;