in reply to Re^4: Perlmonks site has become far too slow
in thread Perlmonks site has become far too slow

How about:
RewriteEngine On # Match requests like /?node_id=12345 RewriteCond %{REQUEST_URI} ^/$ RewriteCond %{QUERY_STRING} ^node_id=([0-9]+)$ # Skip if the cookie header contains userpass= RewriteCond %{HTTP_COOKIE} !(^|;\s*)userpass= # Serve cached file if it exists RewriteCond %{DOCUMENT_ROOT}/cache/%1.html -f RewriteRule ^$ /cache/%1.html [L]

Then any time anonymous requests a page, save a copy of what you serve to /cache/$node_id.html, and every time someone posts/edits content under a node, call unlink("cache/$node_id.html");

Apache should be able to crank these out way faster than CGI could.

For bonus points, store the cache in ZFS with compression enabled. Maybe also minify the HTML before saving it.

Replies are listed 'Best First'.
Re^6: Perlmonks site has become far too slow
by LanX (Saint) on Aug 30, 2025 at 17:38 UTC
    If you log out, you'll notice that someone, I suppose Corion is experimenting with caching this thread (only?)

    Try https://www.perlmonks.net/?node_id=11166139 (or whatever domain logs you out)

    One problem is immanent, it's not enough to disable/delete a single node on update, but the parent chain too, because of the sub-thread view showing replies. (Your and Bliako's reply are missing)

    Since we are working with multiple servers I'm pessimistic about using the filesystem for caching, I think it's far easier to implement in the database server.

    FWIW I took a look into the Everything code yesterday, and while it's hard to tell without testing environment (my usual grievance) how it really works, I found that there is a central method

    getNodebyID which occasionally does caching if called with the according flags.

    There is a whole Cache class blessed to $DB to handle it.

    There is also an updateNode method.

    So in theory this should be easily done, alas only Gods can dev and test especially if the core Everything:: modules are concerned, which can't be patched by pmdevs.

    Pmdevs like me are reduced to smartass comments here. (I can't even tell if the online versions of Everything:: really show the currently productive code)

    Update

    In hindsight caching of getNodeById won't be sufficient, I don't think it returns already html

    If you are interested in the gory details please check out the original documentation of Everything, (i.e. before it was heavily patched into Perlmonks)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

      > Since we are working with multiple servers I'm pessimistic about using the filesystem for caching, I think it's far easier to implement in the database server.

      Unless it's possible to direct all AnoMonk requests to one server only. This would have the huge advantage that other web-servers wouldn't be impacted by increased bot activities.

      OTOH update events could happen on other servers and would need to be logged via the DB, so that the "Ano-Server" could discard the cached page.

      update

      Or storing the cache in a shared file system, unfortunately I have no experience to tell how efficient this is.

      But in this case the Ano-Server wouldn't need to run Everything, just a web-server (apache/Nginx/...) distributing static files.

      The updates of the static files are comparatively rare and could be easily handled by the dynamic servers.

      Alas some pages can't be static like log-in, chatterbox, other users. So the load-balancer or a redirection rule has to bring Ano-monks to the real servers.

      Anyway ... no matter if it's a static or dynamic server, setting up a dedicated server for Anomonks would improve many issues instantly.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

        Regarding the "macro" caching of anonymously viewed pages.

        After some searching I think the most robust approach is to set up one or more reverse proxies to handle the bots and SEO.

        They can cache all static requests locally and rules can check the cookies and relay logged in users to the upstream servers for dynamic content.

        Cache management is also included, nginx for instance can receive "purge" requests of cached urls remotely via web requests, which means those proxies don't even need to be on the same box.

        And this doesn't even need to be triggered directly from inside the Everything engine.

        Running an external cronjob checking for new nodes via XML ticker (like every x minutes) could be implemented immediately and update all proxies. ¹

        So no need to patch anything inside the monastery except server settings.

        I think "Micro caching" to speed up the engine's most frequent evals and queries is also possible but would certainly require patching the Everything codebase. But this would require good benchmarking and is less relevant for our "bot war".

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery

        ¹) of course querying the DB directly for new and updated nodes and votes would be even better.

        My point is to show how flexible and available this strategy would be while immediately shielding off our engines from DNS attacks.

      > Since we are working with multiple servers

      I'm not sure this is still true. Can Corion clarify?

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

        We are using only one webserver, behind a single IP address.