Since at least 6 months, the site suffers greatly from residential proxies scraping the site, most likely gathering the data for AI training. This makes the site intermittently unavailable for humans.
Moving the anonymous parts of the site behind a CDN protects the machine(s) hosting the site from random invalid requests and also provides a larger cache than we have to the bots feeding the AI. This is not ideal, as I dislike adding intermediate services, but it is better to have an accessible site than to have no accessible site.
The CDN should be mildly smart in the sense that we want to prevent whole classes of (invalid) requests from hitting the backend at all:
Moving the site behind a CDN would mildly imply changing some settings of Anonymous Monk. Especially the short-lived parts of a page like the chatterbox, the CPAN nodelet and some other nodelets would not be shown anymore. Depending on the CDN setup, the nodelets could potentially be included dynamically.
Having logged-in users access the site via (say) user.perlmonks.org on a different machine+IP address will ideally prevent bots from clogging the access lane of logged-in users, provided that the URL does not leak to the vibe-coded scrapers too much. That different machine could also be far more aggressive with its CDN/firewall/whatever rules and outright reject all requests that do not contain a valid session cookie.
This second machine might or might not be necessary, depending on whether the CDN already takes the brunt of bots and we can use the existing webserver machine for that. A separate machine allows us more aggressive configuration of that machine with respect to the expected kind of requests.
The site is currently hosted by pair.com on managed Apache using mod_perl on one machine and a second machine hosting a managed MySQL database. If your contribution does not keep these parts, think really hard about whether your approach is tenable before posting it.
Please refrain from offering solutions unless you have proven experience with the Everything engine and integrating your solution with it. Also refrain from posting your thoughts / brainstorming under the expectation that anybody reads it and responds to it unless you have applicable and actionable points to contribute.
What helps us:
What does not help is:
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: On improving Perlmonks site availability
by dissident (Beadle) on Mar 30, 2026 at 21:44 UTC | |
by LanX (Saint) on Mar 30, 2026 at 23:00 UTC |