in reply to Re^3: Randomization as a cache clearing mechanism
in thread Randomization as a cache clearing mechanism

First thanks for the clarification, both of them. The ps one is a bit embarrassing to be honest, it seems like log has a silly prototype and I dont know why but im was expecting "log" to return base 2, and ln() to return base e. I wonder where i got that meme? Hmm.

D:\Dev>perl -e" print prototype 'CORE::log'" ;$

Shouldnt that really be a prototype of $? /grr its the "default-to-$_" behaviour that makes the parens mandatory.

Ok, re: caching infrastructure. First you have to realize that caching occurs on a per httpd level, and that we have two physicial web servers and a dedicated DB server running the site. The consequence of this of course is that there is no way to do an "update-spoils-the-cache" process. The httpd doing the updating has no way to talk to the other httpd's except through the DB table.

To make things more interesting you have to realize that a good chunk of what you see here on pm is built up out of code that is stored in nodes much as the text of this reply is stored in a node. This code is evaled for every use, and before it is evalled the version is fetched. So to give you an example "parselinksinchatter" might be used three or four times in a given page display, and before each use the version table gets queried.

Now I have been working on a means of caching compiled forms of the code nodes and ensuring that version checks only occur once per page fetch for code nodes, but i thought pruning the version table might also be a quick win. Overall I can see from the general points raised in the replys that its not going to make much if any difference.

Thanks to all who replied.

---
demerphq

Replies are listed 'Best First'.
Re^5: Randomization as a cache clearing mechanism
by kappa (Chaplain) on Nov 20, 2004 at 11:20 UTC

    I see. We've got a large webmail installation here. Millions of users. All our backend web servers cache complex objects to a dedicated box with memcached, so we "spoil" cache on update as you say. And even if there's no dedicated cache box, I think it's worthwhile to at least try this strategy. In the case of current PM it will involve either propagating the "spoiling" to all neighbours via tcp/ip or storing the cache inside mysql database instead of web server RAM. I realize that these are big infrastructural changes and are hard to implement, though.

    Also, I'd try to invest more time in increasing complexity of objects in the cache. Nodes seem to be "atoms" on PM. Too primitive and numerous to cache separately. There could be ways to cache whole threads with all nodes glued together to form a more high-level object. Nodelets are good candidates too. Do I make sense? I've implemented it for whole MIME structures and mailbox listings here in our webmail system. That payed off.

    By "trying" I always mean "benchmarking a simple prototype" :)

    P.S. The buttons on my calculator (old soviet Electronica) are ln, log and lg -- for natural, arbitrary and decimal logarithms, so perl (and C) log confuses me too.

      I looked at memcached quite a while ago (seems like 2 years but probably hasn't been that long). My immediate impression was that they needed to get rid of the biggest race condition in their design.

      I wrote the authors and they noted that they had gotten the same request (support a revision field in the data and prevent old data from overwriting new data). It appears that they still haven't managed to implement this simple and, IMO, important idea.1

      Without that, I wouldn't use memcached for the more difficult PM caching problems, that is, things that get updated frequently such as the CB.

      Of course, adding this to the open source project would probably not be that difficult of a task. And memcached could still be useful for PM without it.

      - tye        

      1 Preventing old data overwriting new data still allows for races, but it is much better and is the best PM will ever do.

      memcached should also be fixed to support optimistic locking which can be race-free but any update might fail and have to be presented to a supervisory function (usually the user) to figure out how to handle the failure (and such for PM would suck more than the limited race conditions possible without going to optimistic locking).

      To support optimistic locking, memcached would need to allow updates that say "here is the new data and it replaces revision X" and cause the update to fail if revision X isn't the current revision (and do updates atomically, which shouldn't be hard and I think is already the case based on my understanding of their design).

      But in any case, memcached is in serious need of a way for it to track revisions (such as a version number like PM's node cache uses or a timestamp -- I'd just support any byte string where a simple 'cmp' determines which is newer and possibly also support variable-length digit strings).

        Hm. The memcached API has five principal commands: get, add, set, replace & delete. Distinct add which fails in case the key is already present in the cache along with a deletion delay helps prevent races (not completely). I think that developers' intention is to avoid introducing locks or versions by all costs.

        And yes, I wouldn't use memcached in a banking environment, it seems to be a MySQL-type product -- speed ahead of reliability.

        Thanks for bringing it up. Doing things which will not do much harm to humanity in case of failure tends to shift priorities :)

      I have to say this really interesting. Unfortunately im flying off on holiday in a few hours so I wont be able to look into it further until I return.

      And yes, part of the compiled code caching stuff ive been tinkering with does things like that. Anyway till later, thanks. :-)

      ---
      demerphq