in reply to PSGI/Plack unsatisfactory performance

If you get *any* connections dropped, something has gone wrong. You have 100 concurrent requests, so any server with a listen() backlock of at least 100 should serve every request without dropping any.

I suspect the "something wrong" is that you ran with the default "max requests", which for Starman is 1000. This means after 1000 pages served, it will kill and start a new worker. While the worker is restarting, perhaps that loses a connection?

Jmeter looks like an unpleasant pile of Java and GUI with a very long manual, so I'll make some examples with 'ab' instead.

My laptop is a Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 4 core / 8 thread.
Here's my app:

$ perl -v This is perl 5, version 34, subversion 0 (v5.34.0) built for x86_64-li +nux $ echo "sub { [ 200, [], ['OK'] ] }" > app.psgi

I'll try Gazelle first:

$ man Gazelle | head Gazelle(3) User Contributed Perl Documentation Gazell +e(3) NAME Gazelle - a Preforked Plack Handler for performance freaks SYNOPSIS $ plackup -s Gazelle --port 5003 --max-reqs-per-child 50000 + \ -E production -a app.psgi $ plackup -s Gazelle --port 5003 --max-reqs-per-child 50000 -E product +ion -a app.psgi & $ ab -n 10000 -c 100 http://localhost:5003/ (snip) Document Path: / Document Length: 2 bytes Concurrency Level: 100 Time taken for tests: 0.430 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 940000 bytes HTML transferred: 20000 bytes Requests per second: 23277.36 [#/sec] (mean) Time per request: 4.296 [ms] (mean) Time per request: 0.043 [ms] (mean, across all concurrent reques +ts) Transfer rate: 2136.79 [Kbytes/sec] received

So, mine is running 4x faster with no dropped requests. On a laptop.

Now Starman:

$ starman --workers 8 --max-requests 1000000 & $ ab -n 10000 -c 100 http://localhost:5000/ (snip) Concurrency Level: 100 Time taken for tests: 0.585 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 770000 bytes HTML transferred: 20000 bytes Requests per second: 17102.06 [#/sec] (mean) Time per request: 5.847 [ms] (mean) Time per request: 0.058 [ms] (mean, across all concurrent reques +ts) Transfer rate: 1285.99 [Kbytes/sec] received

Now Feersum:

$ plackup -s Feersum --pre-fork=8 --access-log=/dev/null app.psgi & $ ab -n 10000 -c 100 http://localhost:5000/ (snip) Concurrency Level: 100 Time taken for tests: 0.542 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 400000 bytes HTML transferred: 20000 bytes Requests per second: 18437.56 [#/sec] (mean) Time per request: 5.424 [ms] (mean) Time per request: 0.054 [ms] (mean, across all concurrent reques +ts) Transfer rate: 720.22 [Kbytes/sec] received

I think all of these are performing professionally for me, when you consider that the request overhead is extremely small vs. the time for a database request. All of these are intended to be combined with a front-end like apache or nginx, which is what you would use to serve static content. In fact, most of them warn you they *need* to be combined with a frontend to get safe HTTP sanity checking. If perl is only used for the dynamic content, the performance overhead of the app server is even less important, because the database will dominate.

Update:
As a followup, I ran it on a small-but-fast server and got 31810/sec for Gazelle, on 6 cores. But when I went to a different somewhat-slower server with 24 cores, I got only 8000-9000/sec no matter how many cores I asked it to use or which server module I plugged into plackup. The second server is using the official docker image for Perl 5.34. I really don't know what to make of these results.

Replies are listed 'Best First'.
Re^2: PSGI/Plack unsatisfactory performance
by locked_user beautyfulman (Sexton) on Dec 09, 2021 at 02:54 UTC
      Except php is the short bus of programming languages, so having it run fast doesn't really interest me. PHP gets to make optimizations that perl doesn't, like storing pointers directly to the functions instead of looking them up from a symbol table, but then that also ties your hands when you want to do things like override a function or wrap it with a method modifier, which are easy in perl and impossible in php. If there was any one language I wish I could eliminate from my life, it would be php, but wordpress/drupal popularity makes that hard. It's even faster these days because Facebook pours money into it; its a shame the r&d doesn't go toward a more deserving language.

      I feel like there is something going wrong with the "accept" loops in Gazelle. I checked on the implementation, and it looks very much like it forks, and then each worker calls accept() on the same listening socket, and then they *should* be able to receive new connections in parallel. Yet, on the slower server, the pool of workers were unable to beat the performance of a single worker. I'm aware of the "stampede" effect where a listen socket becoming readable wakes all the workers instead of just one, but with Gazelle's loop implemented in C, that should still be low enough overhead that they should be able to run in parallel even for a tiny request. I'd be interested to see it if you or anyone else decide to chase this down to some microsecond-level traces. It's a lot of effort though, to shave off a mere 3-5ms per request. It wouldn't make any difference to any of the apps I maintain.