in reply to Re^2: Perlmonks site has become far too slow
in thread Perlmonks site has become far too slow

If there were an easy fix it would have been applied.

The thing that puzzles me is that, of all the websites that I regularly peruse, perlmonks stands out (and I mean really stands out) as clearly being the slowest and flakiest.
Why is that ? What is the "feature" of perlmonks that makes it so extremely susceptible to these attacks ?

Cheers,
Rob
  • Comment on Re^3: Perlmonks site has become far too slow

Replies are listed 'Best First'.
Re^4: Perlmonks site has become far too slow
by Corion (Patriarch) on Aug 29, 2025 at 14:06 UTC

    Every page is several database accesses and several Perl eval calls. See DBIx::VersionedSubs for something like it, but not used on Perlmonks itself.

    I think converting to static files for (say) SoPW nodes for Anonymous Monk might reduce the load so that the site remains accessible for the human users. I'm thinking of measuring the impact of a -f call for every page load. Of course, using static files means that the nodelets are either stale or need to be removed from Anonymous Monks view of the site.

    Caching is of no use, since the bots are basically hitting all URLs with equal randomness, so there is no set of "hot" nodes.

      I think converting to static files for (say) SoPW nodes for Anonymous +Monk might reduce the load so that the site remains accessible for th +e human users.

      or will encourage even harder attacks. They definetely have the resources and will occupy the extra bandwitdh, space anyway, when they discover some was created.

      I dont understand why not blocking an ip as soon as anonymous hits lots of pages in quick succession especially old ones which have not been visited for a while. We had similar discussions before (Re^3: Unable to connect), interesting clues came about by jdporter's investgations. Now a different angle is discussed. Are we going in circles?

      The site is so unusable it discourages me to spontainously post. I dont have the luxury to save a failed post, provided i find it, for future posting at the time the wicked bots got a minute of rest (bots have a cup of tea).

      An idea would be to contact the AI bots and sell the data directly to them. Can't have more static than what's inside a usb.

      edit: last time they tried to pacify the beast, it swallowed Czechoslovakia.

      24h edit: we can have a live site where all those interactive users go to vote/post/admin and have the static site updtated from the live site daily for anon, search and bots. The live site for logged in users will be just as this now but in a secret www._perlmonks.org address. then every night the static site is updated with the daily deltas from the live site. That will keep the monks shetered from the evil bots which will ravish everyone else outside the monastery, "that's a shame!". Relevant soundtrack while our sitedev clan fights the evil bots: Yoshimi fights the pink robots by Flaming Lips (youpuke warning, ot: i use newpipe app from f-droid to view yt content on mobi). I saved this edit 12h ago when site was unusable and posting it now when site seems ti be very responsive.

      bw, bliako

      How about:
      RewriteEngine On # Match requests like /?node_id=12345 RewriteCond %{REQUEST_URI} ^/$ RewriteCond %{QUERY_STRING} ^node_id=([0-9]+)$ # Skip if the cookie header contains userpass= RewriteCond %{HTTP_COOKIE} !(^|;\s*)userpass= # Serve cached file if it exists RewriteCond %{DOCUMENT_ROOT}/cache/%1.html -f RewriteRule ^$ /cache/%1.html [L]

      Then any time anonymous requests a page, save a copy of what you serve to /cache/$node_id.html, and every time someone posts/edits content under a node, call unlink("cache/$node_id.html");

      Apache should be able to crank these out way faster than CGI could.

      For bonus points, store the cache in ZFS with compression enabled. Maybe also minify the HTML before saving it.

        If you log out, you'll notice that someone, I suppose Corion is experimenting with caching this thread (only?)

        Try https://www.perlmonks.net/?node_id=11166139 (or whatever domain logs you out)

        One problem is immanent, it's not enough to disable/delete a single node on update, but the parent chain too, because of the sub-thread view showing replies. (Your and Bliako's reply are missing)

        Since we are working with multiple servers I'm pessimistic about using the filesystem for caching, I think it's far easier to implement in the database server.

        FWIW I took a look into the Everything code yesterday, and while it's hard to tell without testing environment (my usual grievance) how it really works, I found that there is a central method

        getNodebyID which occasionally does caching if called with the according flags.

        There is a whole Cache class blessed to $DB to handle it.

        There is also an updateNode method.

        So in theory this should be easily done, alas only Gods can dev and test especially if the core Everything:: modules are concerned, which can't be patched by pmdevs.

        Pmdevs like me are reduced to smartass comments here. (I can't even tell if the online versions of Everything:: really show the currently productive code)

        Update

        In hindsight caching of getNodeById won't be sufficient, I don't think it returns already html

        If you are interested in the gory details please check out the original documentation of Everything, (i.e. before it was heavily patched into Perlmonks)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery

      Some remarks/ideas for the logs:

      > Every page is several database accesses

      getNodeByID in Everything/NodeBase.pm is using a cache when accessing the DB.

      472: sub getNodeById 473: { ... 488: # See if we have this node cached already 489: $cachedNode = $this->{cache}->getCachedNodeById($N); 490: return $cachedNode unless ($selectop eq 'force' or not $cach +edNode);

      see also Everything/NodeCache.pm

      I suppose the gods have increased the cache-size from the initial 300 in Everything/NodeBase.pm?

      095: $db->{cache} = new Everything::NodeCache($this, 300);
      Does the caching prioritize based on access count, I have to admit this is not easy to grasp.

      > so there is no set of "hot" nodes.

      there is no set of hot posts (which are internally nodes) but specific code and html nodes certainly are heavily used internally (AFAICS is 99,9% of the monastery held in DB-nodes)

      > and several Perl eval calls

      using memoization of in Everything/HTML.pm might help here to avoid unnecessary compilations

      968: sub evalCode { 969: my( $code )= shift @_; 970: my( $CURRENTNODE )= shift @_; 971: # Note! @_ is left set to remaining arguments! ... 985: my $str = eval $code; ...

      (tho there might be a side effect of pre-compiling into an extra sub layer)

      something like (untested)

      my $sub = $evalcode_cache{$str} //= eval " sub { $str }"; my $str = $sub->(@_);

      html caching

      an internal caching of the result of std_node_display into the DB might help too, but here plenty of side parameters need to be taken into consideration.

      A caching for Anomonk alone must take into consideration (at least)

      • if the sub-tree of replies has changed
      • if the content of the post or any reply has changed by update
      • if the down-votes for a reply has reached the so called "crap-level" to be hidden
      A pragmatic solution would be to not list the content of all replies for Anomonk, just the links to the direct replies.

      The "print view w/o replies" is already close, but doesn't include links to children replies yet.

      compare https://perlmonks.org/?node_id=11164875;displaytype=print

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

      > I'm thinking of measuring the impact of a -f call for every page load.

      I'm not sure what a -f call means ... (?)

      > I think converting to static files for (say) SoPW nodes for Anonymous Monk might reduce the load so that the site remains accessible for the human users.

      This is a brainstorm:

      From what I see is anonymous monk's only dynamic content for > 99% of the nodes is in the nodelets ( Chatterbox, Other Users, and what else?) and those should be of low priority for AnoMonk.

      A frontend could check if the user is logged in and if the node-id is lower than the last caching and/or deliver a static file if present.

      If the caching to a static file is done ...

      • once per day as a bulk operation
      • or if file is missing
      • or on every write (like updates)
      ... is a matter of debate.

      However, not sure how best to deal with named nodes.

      I can't tell if the caching should best be done in the file system or in a DB.

      And if the "frontend" could be in realized in a web-server rule or a patch in Everything.pm

      (I'm sure I've not covered all edge cases, but wanted at least have it written them down for future reference :)

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery