Re: Can your site handle this?

With absolutely no intent to make any judgment whatsoever with regard to Logicus, the original author of this post, I would nonetheless like to point out a couple of “common misconceptions and flaws” that often do crop up in deployed designs ... things which are, in fact, clearly demonstrated in and by this post. Again, using it only as a conveniently-presented example, let me briefly step up onto my soap box.

The first:

The above code spawns 128 workers, which then download pages back to back as fast as they can ... [...] ... system would run into troubles with 8 workers, and pretty much grind to a halt under the load of 16 workers.

Many designers assume that “the more threads, the merrier.” Surely 128 threads can do twice as much work in the same amount of time a 64 threads could, and so on. But the reality is that the actual capacity of any system is, in this case, limited both by space and by time. Each thread obviously consumes some amount of resources, some of which can be shared among all instances but many of which cannot. If the most-scarce resource (memory) becomes strained, thrashing occurs, and this results in an exponential loss of performance not-so-affectionately known as “hitting the wall.” Examination of this particular system might show that, say, a worker-count setting of 12 might be optimal, and so this machine would be configured to launch no more than that number of workers, who would process one request after another (without dying in-between...) for weeks or months continuously. The worst-case throughput of this system could easily be calculated, and it would remain fairly constant.

The second: (Same example.)

Many web servers, either through “simple CGI” or some other means, fire a new flaming-arrow into the air with each request. Such servers can easily be forced into a denial-of-service state simply by dumping an excessive number of requests on them. The denial of service that can be achieved can be quite devastating if it forces the system into unconstrained thrashing such that console operators find it difficult or impossible to issue commands to stop the flood. (Moral: “if you fire flaming arrows into the air, you’re just going to burn down the entire town, and this without even one single tasty marshmallow being successfully toasted.”)

The actual expected performance curve of any system, for any application mix, can be plotted on a graph. The curve of that graph will always be that of an “elbow,” with a gently sloping linear curve followed quite abruptly by an exponentially-ascending “the wall,™” which in this case emphatically is not an inspired rock-n-roll album. Like any good self-powered machine, a request-processing system must be designed with queues and governors. It must have the means to regulate its own performance and to adjust its own control parameters in real-time in response to changing conditions. It must of course endeavor to process every request in a predictable manner, but in truly extenuating circumstances it must be capable of selective load-shedding, as well.

It is a very common objective of a web site that the web site can be used to make work-requests. But too-many web sites take the approach that they are not only “the user interface,” but “the worker” as well. To see the folly in this approach, one merely has to look at a real-world example of a request processing system that works well: a fast-food restaurant (as originally designed by Ray Kroc of McDonald’s fame). The person who says, “may I take your order, please?” does nothing else. “The fry guy,” “the burger-meister,” and the fellow who mops the floors don’t take orders from anyone (other than “the big cheese”). There is a strict separation of roles and responsibilities, a well-designed work flow, and a workflow management system (now computerized, but at one time based on paper tickets and that little rotating carousel onto which the tickets could be stuck). While I am not advocating that you should ever actually eat at one of those establishments, you can get a lot of good examples from them.

Last but not least: “remember CPAN.” There are plenty of workload-management engines out there, in various stages of sophistication and completion. As is always the case with Perl, you don’t have to start from scratch on your new project.