in reply to heavy %system load at concurrent request (mod_perl + worker mpm)

0.8 seconds for a page request seems like a long time. I'm not clear on whether you think 0.8 sec for a single request is too long, or if you feel 0.8 seconds is reasonable for your application and you are quoting that as a baseline. I'm going to assume you meant the latter, because you're specifically asking how to improve concurrent performance.

locks are used to block read and write, but those are just 5% of the execution time.

Is that 5% of the total CPU time used by the request, or the wall time? While there was constant CPU usage in your "concurrent" graph, it was actually lower (peak) than the serial requests, which makes me wonder if there is indeed some resource contention at play. I/O is complex. iostat can show low kB_(read|write) values even in I/O-bound situations, especially if you're just looking at the hard disk devices.

Also, how exactly are you doing your benchmarking, and from where? (localhost? LAN?) Often when #hits/second goes way down under concurrent load, network is the culprit, and Apache tuning #2. Is your system designed well enough to allow you to easily simulate requests outside of Apache? If so, see if you can replicate your results. If not, you know where to look. If the results are the same, then it's a problem with the Perl code itself.

Once you have your non-Apache test, you can also profile it much more easily, which will help you very rapidly zero in on CPU-hungry routines (or routines where wall time far exceeds CPU time).

  • Comment on Re: heavy %system load at concurrent request (mod_perl + worker mpm)
  • Download Code

Replies are listed 'Best First'.
Re^2: heavy %system load at concurrent request (mod_perl + worker mpm)
by arpx (Acolyte) on Jul 18, 2013 at 22:50 UTC
    I took a extra long road though the application. 0.8 is more likely the upper baseline for the setup.
    but see it mostly for benchmarking. its probably 10x as fast in production The interesting part is not that, but why worker mpm seems to torture the system beyond whats healthy when it should work faster side by side.

    the total cpu was lower yes, and the %user part as well.
    (i assume that would be apache). so on the paper it looks very cpu light and efficient, if i just could get my head around that system load.

    locking is set 1100 times per request, locking + waiting time for one serial run is about 0.1s. in 10 concurrent i got a total (locking + waiting time) of 1.2s out of 28s per request. due to the long execution.

    The benchmark is made from a simple perl script. spawning n threads with LWP::UserAgent request to the external server. clocking the request from fire to done. The target server is connected to the same local network so should not be any delay there.

    tx for the tip about trying to isolate it to apache. i think it can be hard to do, its quite tied up to mod_perl with apache handlers. but i'll look into it

    will also try to loose up the locks and see if that is contributing parameter here.