in reply to PerlMonks Caching

it's really difficult to do perlmonks something good.

I've been asking on where the problems are several times. It would be useful to run something like munin and look at a statistics on which pages are retrieved most. It's better than making a wild guess.

I'm using caching of data structures for my own forum software. It really helps a lot. I have some experience in scaling a large website. I just need an instance with the current software in use and *some* statistics. installing/configuring munin is done in 15 minutes or so.

I've been asking for munin statistics, log files, I wrote a skeleton script which creates a KinoSearch index, for moving the current mysql "like" search to KinoSearch. (Using KinoSearch for my own forum software. It is - of course - so much faster than a mysql like query.)

Obviously the problems are not big enough yet, or I'm talking to the wrong persons.

So yes, memcached or anything would help, but if nobody lets us get to the code/machines to actually do something, it will stay like it is forever... =(

Replies are listed 'Best First'.
Re^2: PerlMonks Caching (data)
by tye (Sage) on Apr 21, 2010 at 16:14 UTC
    I've been asking on where the problems are several times. It would be useful to run something like munin and look at a statistics on which pages are retrieved most. It's better than making a wild guess.

    Well, I wasn't guessing wildly. Page load numbers were even public information until they got broken (by a mysql update, I think -- I don't have those details swapped in at the moment).

    And I've posted public nodes about the problems that have been identified, how many of them were fixed, and how they get rebroken. It is too bad that you seem completely unaware of them. *shrug*

    Update: Sorry, I'm sure that sounds harsh. Let me echo Corion's best reply. Your help and support is sincerely appreciated! Very little in PerlMonks admin is instantaneous, far from it. I wouldn't rate memcached as the first priority but I'm also convinced it would be an improvement (except for the race condition which might hardly ever be noticed or might be quite an annoyance that other users of memcached don't run into).

    - tye        

      except for the race condition which might hardly ever be noticed or might be quite an annoyance that other users of memcached don't run into
      I didn't get that race condition anyway. Especially the "flush to memcached".
      What i'm doing in my forum at the moment is storing a whole thread in memcached as a data structure. Expiry is set to 3 minutes (could also be more, just because I'm storing also user info and signatures, it's set to 3 minutes, so that user info changes get updated faster).
      Now, whenever somebody posts a new node or updates their node, the cache entry is explicitly expired immediately. Database transaction, after that delete thread cache entry, then redirect the user to the updated thread or node. And the request for the thread then reads from the database and creates the cache entry again.
      Your use of memcached seems a bit different, if I understood that correctly? So the wrong flush to memcached wouldn't happen in my case.

      edit: yes, I think that's the point - in the second block of server X you are flushing N2 to memcached, but the current version in the databse is N4. If creating the cache entry only when fetching a thread the RC shouldn't be there.

      edit 2:
      or maybe I can still create an RC:

      X: reads thread with node N1 X: votes N1 -> N2, delete cache entry X: redirect after POST and reads N2 from DB... Y: author updates node from N2 to N3 Y: delete cache entry, if exists Y: redirect after POST and reads N3 from DB Y: cache N3 to memcached X: ... cache N2 to memcached
      But that version must have a whole transaction and redirect and memcached operation in between that X blocks. of course, it's still possible, but extremely improbable. Could maybe be prevented by putting the read and creating the cache entry into a transaction with read lock?
        I didn't get that race condition anyway.

        s/get/notice/

        Different? Yes. Lacks the race? No.

        Y: request arrives when Alice updates her node, N (having noticed a typo that makes her node appear extremely rude)
        Y: delete N from memcached

        X: request arrives when Bob downvotes looks at Alice's node, N
        X: N found missing from memcached
        X: read version 1 of N from memcached DB, N1
        X: decrement 'reputation' field in N1, producing N2
        X: flush N2 changes to DB (update ... rep=rep-1 ...)
        X: re-read N from DB, yielding N2 again
        X: the slowness of this web server matters at this point

        Y: read version 1 of N from memcached DB, N1
        Y: apply update to node text, producing N3 N2
        Y: flush N3 N2 changes to DB (update ... doctext='...' ...)
        Y: redirect to display node as response to update
        Y: re-read N from DB, yeilding N4 N2 (includes both the text update and the reputation decrease)
        Y: flush N to memcached, storing N4 N2

        X: flush N to memcached, storing N2 N1 (Oops!)

        In this scenario, Alice sees her update applied while nobody else does. If Alice refreshes, then her update mysteriously vanishes. With the 3-minute expiry, her update mysteriously appears again due to the cache timing out rather than when the next update is done.

        Other than the race condition (which should just be fixed), I don't see the point in expiring the cache so frequently. Let the LRU do its job.

        Your scheme seems more complicated (and less efficient) yet doesn't remove the race.

        - tye        

        Ah, multiple within-node updates. So easy to spot...

        But that version must have a whole transaction and redirect and memcached operation in between that X blocks. of course, it's still possible, but extremely improbable.

        Yes, the lopsidedness of the number of operations that must race through vs the number of operations that constitute the "hole" can be a big help. And I expect that most systems would not manage to trigger the above race.

        But I've seen several different ways where parts of PerlMonks get extremely sluggish. So I'm more concerned than most about some very sluggish process throwing out a very delayed update.

        Could maybe be prevented by putting the read and creating the cache entry into a transaction with read lock?

        Did you really propose that? :) Either the race is never going to happen or there are cases when an update takes many times longer than it really should. You want to hold a read lock for an extended period? The solution I proposed boils down to a single strcmp() in the server.

        That reminds me of seeing notes about fixing concurrency issues in memcached's use of threads when I was looking through some release notes. Part of the beauty of memcached was that it didn't have concurrency problems because it had such a simple and elegant design. Both sad and funny to see that get lost...

        I would've allowed memcached to make full use of multi-core CPUs without using any mutexes at all (for the core features). You just continue the existing design one layer further. In this server process you have N sets of completely independent data structures and N threads. Each thread only ever deals with one particular data structure. You pick which thread (and thus which data structure) to pass off a request to exactly like how you pick a memcached server from a cluster: by hashing the key involved.

        I'd actually push this abstraction out to the client so that a client having a cluster of 2 machines, each with 4 cores, behaves just like it has a cluster of 8 servers, except that 4 of those servers actually share a single IP address / port.

        When a request comes in to the service, the main thread hashes the key and hands the connection off to the appropriate thread. Even the hand-off doesn't require a mutex because you can use a round-robin queue with only a single thread enqueueing and a single thread dequeueing.

        You completely avoid the problems of forgetting to lock something (leading to races that cause corruption), then putting a lock in place but it being too global of a lock and greatly reducing concurrency, then splitting the lock into several more-localized locks and eventually having so many locks that it becomes non-trivial and then nearly impossible to totally grok the order in which locks might possibly be acquired and you reach the ultimate stage of having deadlocks.

        I've seen that cycle way too many times. I wish more people would learn more from it. :)

        - tye        

Re^2: PerlMonks Caching (munin)
by tye (Sage) on Apr 22, 2010 at 23:56 UTC
    installing/configuring munin is done in 15 minutes or so.

    Despite the Munin site list of prerequisites never mentioning anything about permissions, that I noticed, the "INSTALL" file says:

    Create the user "munin" and the group "munin"

    I no longer even have access to run /usr/bin/top. I certainly can't create users, even given the mythical 15 minutes (took that long just to find out whether 'root' was a requirement).

    P.S. I believe this thread is the first I have ever heard of Munin (much less repeated requests to have it installed).

    - tye        

      Well, too bad that you've never heard of it. In most of my past jobs it was used as a monitoring tool (besides nagios). It's really great (and writing own plugins in perl is fun), and I can only say on debian I do "aptitude install munin munin-node munin-plugins-extra" and then activate the plugins for mysql und apache and I'm done.
      Of course if you're totally new to munin then it might take you longer than the 15 minutes. And yes, you have to be root (I don't know if it's possible to run it as a normal user, some of the values it queries might only be available to root). So, no munin, ok. I mean you don't have to. You could use the tool you like best =)
      Still, if you need help with configuring munin, just ask. I'm also in irc.perl.org
        Munin looks a lot like rrdtool or MRTG to me. If you like monitoring tools and like writing your own plugins in Perl, may I suggest you look at Argus as well.

        With Argus, NetSNMP, and a little Perl I set up monitoring of disk, network, processor, and memory usage on servers; HTTP, outbound SMTP, POP3, IMAP, DNS, RADIUS, L2TP, webmail (including login), main web page (checking for proper page served), sshd (including login), and MX host functionality/response time/uptime; circuit, interface, subinterface, firewall, and tunnel status/uptime/usage; log sizes, backup progress, server room temperature, ticket system accepting logins; and access concentrator stats like ports filled, longest user session, uptime, average connect time per port, average connect time across the concentrator, and even power cycle times and watts drawn through the outlets from the remote-controlled power strips.

        It's not that Nagios is bad or anything, or that Argus will necessarily replace either Nagios or Munin for your uses. It's certainly worth a look, I think. I used it to replace a commercial proprietary product my boss at an ISP had been using -- one which we really needed more system licenses when I got there, because he was only monitoring the most essential services to save on software upgrade costs. The proprietary, closed source product also didn't monitor every type of device and service we needed to monitor and required a Windows NT or 2000 license for the server to run it.

        I was able to throw Argus and an iptables-based (with some Perl wrapping it) redirector for unpaid subscription accounts on a machine with our L2TP system (with the IP addresses managed by a Perl program). The redirector replaced a $6000 Cisco-branded Windows 2000 machine sold as an appliance, BTW, which had been compromised but for which Cisco wanted a year-long support contract before giving us administrator access. I ran it all on a Pentium 233 pulled from storage, and we never hit a load level over 10, with typical load of 0.5 to 2. My boss called me the Windows killer. I just told him it was Linux, Perl, and a few other neat open-source toys. If only he'd asked more for half-day implementations on Linux before spending money on "solutions"...