O(1) time for the first problem - insertion and retrieval - is no sweat in perl. Pushing (at tail) and shifting (at head) from a normal array is likely going to be as fast as it can be if you let perl do it. However, marking something as "recently used" and filtering it to the back of the line - that may be a bit more problematic with perl's arrays.

I think your consideration of a linked list instead of an array for LRU is likely to be as good as it gets. In this scenario, you'd have a reference to the LRU cache from the main cache so you can find the LRU in fixed time. Each element in the LRU cache would have a link to the previous and next items in the list so that you can unlink the current object (by setting $obj->{prev}->{next} = $obj->{next}; $obj->{next}->{prev} = $obj->{prev}) and then tack it on the end (you'll need to keep references to first/last in the list to find this in constant time).

Extending this just a bit, and the value you store in the cache would itself be the LRU object (reduces references by one).

Alternately, you could go with modern garbage-collection algorithms, discard all the overhead in memory, and live with a slower LRU algorithm. You can get rid of a lot of data this way, but you also can get rid of a lot of access overhead. Each access to the data may be O(1), but if you access them Y times, we've just become O(Y). By living with an O(N) algorithm to find LRU objects, if N is less than Y, we still have a faster algorithm. That is, if you do more accessing than creation/insertion, you may actually be better off just having the cache store the last access time, and then when memory is low, go through and sort on the last access time, thus you find the oldest access times, and discard until you have enough memory again.

Update: Where exactly the balance is between O(N) and O(Y) in the previous paragraph is, is very difficult to figure out in the borderline cases. In extreme cases (almost all insert/create and no retrieval, vs almost all retrieval and almost no insert/create), the direction to go is obvious. But I'm not going to pretend to be able to give advice on any case in between. I lean towards the slower LRU algorithm partly because simpler code usually means fewer defects, so it's probably better to have slower, correct code than faster, incorrect code. :-)


In reply to Re: Designing a cache data structure by Tanktalus
in thread Designing a cache data structure by spurperl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.