in reply to Newest Nodes Page

Caching things is harder then it may appear. There's two web servers, and several dozen (most likely) httpd worker processes on each. Any update can go to any of them, and is then reflected on the DB server. It's difficult, then, to invalidate the cache intelegently, unless it's done on the DB server. If it's done on the DB server, which is very loaded, then it needs to be /very/ tight code, or it's not any better then intelegent use of the DB, which we already do.

Also, we aren't set up very well for making modifications to, or additional code to run on, the DB server, because such code isn't nodes in Everything.


Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Replies are listed 'Best First'.
(z) Re^2: Newest Nodes Page
by zigdon (Deacon) on Feb 07, 2003 at 13:41 UTC

    What about just caching the newest nodes node on each webserver, automatically invalidating it after 60 seconds? You might still hit the DB once per webserver per minute, but if we have 200 people reloading the newest nodes node often, it'll still result in a lot less DB access.

    Does that make sense?

    -- zigdon

      *Which* Newest Nodes node? Your "newest nodes since" flag isn't the same as mine.

        ummm... huh! that was pretty stupid of me :)

        but the idea is still salvageble. we're aiming for the common case - extream cases can default and still go hit the DB, after missing the cache.

        what I'm thinking of is have an incremental cache, starting with the latest post, and going back, up to a day back. Then, when a client is asking for "show me the newest nodes in the last 10 mintues", the cache process just scans the flat file on the webserver, stopping when it is father back than 10 minutes, and shows the collected data. Of course, if it reaches the end of the cache, and still doesn't reach far enough back, then it just defaults to the current behaviour. Performance hit to that user, but that's the rare case.

        sample data could look like this:

        1044638805|sopw|How do I download a file?|Anonimous Monk 1044638800|sopw|How do I get my script to run?|Newbie 1044638523|med|Pondering the meaning of perl|Phil ...

        yes, the cache process would put some load on the web servers, but from what I understand we can afford that load, and it would defenitly reduce the load on the database, because the cache would only have to be updated once in a while

        does this make any more sense now?

        -- zigdon