arpx has asked for the wisdom of the Perl Monks concerning the following question:

Can't figure out whats the issue here.

Got a _huge_ mod_perl application, running apache worker mpm with a single process and approx. ~40 threads.
The threads shares memory between themselves (threads::share, and some 300mb-500mb is not unusual)
locks are used to block read and write, but those are just 5% of the execution time.

1) Setting apache to 1 process and 1 thread will result in whats marked with
"seriell" in the image below. One call to my testpage takes 0.8s. 10 calls gives a
total of 8.5s before last is completed.

2) The multithreaded version with 1 process and 40 threads results in chaos
(marked with "parallell" in the image).
calling the testpage gives 0.8s, calling with 10 concurring gives the first replay
after 15s and around 25s in the last 5 arrives.

Any clue what can cause the high %system load while running concurrent?
is there a good way to decode the %system load?
its neither I/O, swapping or disk write afaik - since iosstat came out clean

CPU graph over time: https://frost.denacode.se/pub/tmp/serial_vs_parallel.png

tx!
  • Comment on heavy %system load at concurrent request (mod_perl + worker mpm)

Replies are listed 'Best First'.
Re: heavy %system load at concurrent request (mod_perl + worker mpm)
by rjt (Curate) on Jul 18, 2013 at 03:05 UTC

    0.8 seconds for a page request seems like a long time. I'm not clear on whether you think 0.8 sec for a single request is too long, or if you feel 0.8 seconds is reasonable for your application and you are quoting that as a baseline. I'm going to assume you meant the latter, because you're specifically asking how to improve concurrent performance.

    locks are used to block read and write, but those are just 5% of the execution time.

    Is that 5% of the total CPU time used by the request, or the wall time? While there was constant CPU usage in your "concurrent" graph, it was actually lower (peak) than the serial requests, which makes me wonder if there is indeed some resource contention at play. I/O is complex. iostat can show low kB_(read|write) values even in I/O-bound situations, especially if you're just looking at the hard disk devices.

    Also, how exactly are you doing your benchmarking, and from where? (localhost? LAN?) Often when #hits/second goes way down under concurrent load, network is the culprit, and Apache tuning #2. Is your system designed well enough to allow you to easily simulate requests outside of Apache? If so, see if you can replicate your results. If not, you know where to look. If the results are the same, then it's a problem with the Perl code itself.

    Once you have your non-Apache test, you can also profile it much more easily, which will help you very rapidly zero in on CPU-hungry routines (or routines where wall time far exceeds CPU time).

      I took a extra long road though the application. 0.8 is more likely the upper baseline for the setup.
      but see it mostly for benchmarking. its probably 10x as fast in production The interesting part is not that, but why worker mpm seems to torture the system beyond whats healthy when it should work faster side by side.

      the total cpu was lower yes, and the %user part as well.
      (i assume that would be apache). so on the paper it looks very cpu light and efficient, if i just could get my head around that system load.

      locking is set 1100 times per request, locking + waiting time for one serial run is about 0.1s. in 10 concurrent i got a total (locking + waiting time) of 1.2s out of 28s per request. due to the long execution.

      The benchmark is made from a simple perl script. spawning n threads with LWP::UserAgent request to the external server. clocking the request from fire to done. The target server is connected to the same local network so should not be any delay there.

      tx for the tip about trying to isolate it to apache. i think it can be hard to do, its quite tied up to mod_perl with apache handlers. but i'll look into it

      will also try to loose up the locks and see if that is contributing parameter here.