Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^2: Randomization as a cache clearing mechanism

by waswas-fng (Curate)
on Nov 19, 2004 at 21:07 UTC ( [id://409159]=note: print w/replies, xml ) Need Help??


in reply to Re: Randomization as a cache clearing mechanism
in thread Randomization as a cache clearing mechanism

MySQL's query cache works differently that you expect or you did not think that through. It keys the cache off the select string, On queries that do not repeat often there is up to a 13% overhead for the cache code (housekeeping). Because they do 100's of hits per page load the cache would fill with low hit % data and just be a liability. MySQL's query cache starts to make sense when you have a complex query that thrashes and only has a few thousand (up to ~ forty or so thousand) select variants that are executed often. It can also be noted that they even state that the theoretical speedup (given ~100% cache hits) is only around 220%. You can tweak the cache size with query_cache_limit but mind you as you increase the size the overhead goes up -- worst case is worst, best case is worse.


-Waswas
  • Comment on Re^2: Randomization as a cache clearing mechanism

Replies are listed 'Best First'.
Re^3: Randomization as a cache clearing mechanism
by dragonchild (Archbishop) on Nov 20, 2004 at 04:07 UTC
    I'm very well aware of how MySQL's query caching works - I built a reporting system around it. With a good 10-50M cache, you can do a whole heck of a lot of good. The important things about it are:
    1. You can just turn it on - there are no code or schema changes. This makes it easy to benchmark.
    2. It handles all the aging for you in very optimized C. You say there's a 13% overhead. I'm guessing that demerphq's plans may end up with a 50-150% overhead in the average case.

    In addition, I'm going to guess that there's going to be a much higher gain that you might think. Many of the hits, I'm guessing, have to do with invariants - monk data, nodelet data, and the like. I know I do at least 400 pageviews/day on this site, and I'm a low-hit regular. If half the queries for just the regulars get to be cached, then that's at least a 13% savings right there.

    So, yes, it does work as I think and I did think it through. It's not the ideal solution, but it's definitely a quick-hit easy one, as well as easy to verify - just turn it on for a week and see how performance plays. If it doesn't work, then turn it off. No harm, no foul.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      I would be interested to see if this makes and net gain on the PM site. I would think that many basically randomized hits against his indexed table is already performing at very close to minimal cost ( O(log n) -- it is indexed). So it would seem to only add extra housecleaning steps to the database (create cache, hook on updates/inserts in to invalidate cache, expire cache, memalloc, losing actual memory to cache indexes) for the population of a cache that by all common sense would have a low hit rate anyways (the items are atom like in nature -- too many/random to cache or cache better than the initial performance of 0(log n) ). I agree that it is an easy test to do, no actual data has to be changed. It would just be very counter intuitive to me if it did enhance performance in this case. I would think a redesign that treats the cache items in a different scope would be an area that would have more profound impact on performance. treat nodes of different types in different specialized ways with a better data structure for caching.


      -Waswas
        We are in complete agreement here. I have always thought that the Everything Engine is too database intensive, but to change that would require a complete redesign from the ground up. However, if simply turning something on can make a difference, it's certainly worth trying. That was the only reason I suggested the query_cache.

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://409159]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-25 16:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found