in reply to Re^3: PerlMonks Caching (data)
in thread PerlMonks Caching

I didn't get that race condition anyway.

s/get/notice/

Different? Yes. Lacks the race? No.

Y: request arrives when Alice updates her node, N (having noticed a typo that makes her node appear extremely rude)
Y: delete N from memcached

X: request arrives when Bob downvotes looks at Alice's node, N
X: N found missing from memcached
X: read version 1 of N from memcached DB, N1
X: decrement 'reputation' field in N1, producing N2
X: flush N2 changes to DB (update ... rep=rep-1 ...)
X: re-read N from DB, yielding N2 again
X: the slowness of this web server matters at this point

Y: read version 1 of N from memcached DB, N1
Y: apply update to node text, producing N3 N2
Y: flush N3 N2 changes to DB (update ... doctext='...' ...)
Y: redirect to display node as response to update
Y: re-read N from DB, yeilding N4 N2 (includes both the text update and the reputation decrease)
Y: flush N to memcached, storing N4 N2

X: flush N to memcached, storing N2 N1 (Oops!)

In this scenario, Alice sees her update applied while nobody else does. If Alice refreshes, then her update mysteriously vanishes. With the 3-minute expiry, her update mysteriously appears again due to the cache timing out rather than when the next update is done.

Other than the race condition (which should just be fixed), I don't see the point in expiring the cache so frequently. Let the LRU do its job.

Your scheme seems more complicated (and less efficient) yet doesn't remove the race.

- tye        

Replies are listed 'Best First'.
Re^5: PerlMonks Caching (still racy)
by tinita (Parson) on Apr 21, 2010 at 18:54 UTC
    Other than the race condition (which should just be fixed), I don't see the point in expiring the cache so frequently. Let the LRU do its job.
    but in your version you're also updating the cache when updating a node. so it's the same work actually.
    less efficient? I don't think so. Reads happen much more often than writes.
    More complicated? I don't think so. If I update something i simply delete the cache entry. Then let the next read create the cache again and don't care. That's even less complicated because in the read version I create my data structure how I want it. When I have to create the cache entry when updating I have to do the same work, although at that point I'm not reading the whole thread.
    Like I said, I'm caching the whole thread. When updating a node, the whole thread cache entry is deleted. The next read reads the nodes from db, puts them in my wanted data structure and then a) caches that and b) passes it to the template. If I had to create the whole thread data structure in the update before the post, it would seem more complicated to me.
    update: forgot the "pass to template"
Re^5: PerlMonks Caching (still racy)
by tinita (Parson) on Apr 21, 2010 at 19:43 UTC
    so, we're talking about a race condition which might display outdated things in certain cases but doesn't break any data in the database.

    btw., I think I like my version of using memcached more probably because it acts like a cache - requesting thread from cache - in cache ? display : get from db. actively writing all updated things to the cache is more like pushing.

    For example, I have a cache of the overview page (displays the newest node of each of the sub boards), the recent 24h threads (displays a list of threads updated in the last 24 hours) and the cache of the threads themselves. When updating a thread, I delete just these three cache entries and I'm done. Following your strategy to recreate all those three entries just when updating seems to much work for me, especially if you don't know if all of these things are actually requested in the next minutes or not. Only creating when needed seems more natural to me.
    And about what to cache in general, my first thing to cache here on perlmonks would be the newest nodes page, and then the RAT page. For those pages the race condition is even less important, especially it the cache just lasts for three minutes.

      Of course it is about things being displayed badly. You don't use memcached for storing the real data, just caching it (it doesn't try to be reliable enough to be a primary source).

      I don't like updates to a shared cache to be primarily done when reading. When a cache entry expires, every read attempt is going to do extra work to construct a new cache entry until one of them finally gets the update pushed to the cache. Since the cache is fast, it is easy to get a bunch of requests noticing that the cache entry doesn't exist and then they all start the slower work of building the entry for the cache before the first can finish this slow step. That can certainly make your approach less efficient.

      I much prefer readers to get the slightly old version of things until the update has completed and been pushed to the cache. It is silly to cause extra read activity to a subset of data exactly at the time you are trying to make updates to that subset of data. That just exacerbates concurrency problems in the DB.

      Under your scheme, it would actually make more sense to not delete from the cache until after the updates to the DB have been completed.

      - tye        

        Under your scheme, it would actually make more sense to not delete from the cache until after the updates to the DB have been completed.
        Uhm, that's what I'm doing. After the transaction is committed, I do the cache delete and then the redirect.
        Anyways, let's say that's a matter of taste. I personally think it will be a great help to cache. How precisely it is done is probably not important. Good luck =)