Ive been thinking about an aspect of Perlmonks internals and im wondering what people think. Im not posting to PMD becuase I consider this more of an abstract programming question that just an internal PM issue.
We use a caching mechanism on PM. Instead of refetching nodes every time if we have them cached we just check the nodes version number in a version table. Any time a node gets updated the version entry for the node is bumped up by one. This means two things: we have to fetch the version number everytime we use a node, and over time the size of the version table is pretty well 1:1 with the node table.
My point is this: version table fetches are the single most common query we do so it would seem that optimising them as much as possible would be a well a rewarded task.
What im thinking of is a strategy to keep the table size small. My ideas is that every time an update is done to the version table a random check would be made, say rand(10000)<1 which if it passed would result in some portion, maybe 1% or so of the version records would be deleted. Ideally these records would be the oldest that were added to the table. This would involve adding a datestamp and index on the dates to the table. Essentially my contention is that we should see fetch times go up quite a bit if we could keep the version an order of magnitude or two smaller and especially if only contained nodes that were regularly used and updated (which in fact typically is User nodes).
Of course its also possible that an idea like this would slow things down overall. Index maintenance on the added field and on the deletes may be more expensive than the time gained by keeping the indexes small. Cache misses would probably increse marginally so possibly the price there would overweigh the gain. The problem is that without actually implementing the solution i have no way to know.
What im wondering is do those of you with better database/MySQL experience and knowledge than me think that this is worthy area of investigation or do you think that overall this is a blind alley?
In reply to Randomization as a cache clearing mechanism by demerphq
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |