StoneTable has asked for the wisdom of the Perl Monks concerning the following question:

Heya fellow Monks,

This is an issue I've been dealing with for some time now. I need to make multiple HTTP requests from within a mod_perl2 handler. Each request takes a variable amount of time (between 1 and 2 seconds), and all requests must be completed or aborted on a fixed time (call it 3 seconds).

I'm currently using LWP::Parallel::UserAgent to accomplish the concurrent requests, wrapped in an eval block and using SIGALRM/alarm to enforce the fixed timeout (setting the timeout of LWP::Parallel::UserAgent, as with LWP::UserAgent, is the timeout between inactivity, so a slow network can extend the request almost indefinitely). The SIGALRM/alarm seems to work fairly well, but it doesn't seem like the ideal solution.

I'm stuck using Apache2's prefork worker model (the threaded model looks like it'll perform better with my application) but some of the code somewhere isn't threadsafe. I can't say for certain that it's LWP::Parallel::UserAgent, but that's one suspicion.

I don't think I'm doing anything new here, so I'm hoping someone else has been through this and found a good solution. I've done extensive googling and digging through CPAN with no luck. I've considered writing a new, thread-safe alternative to LWP::Parallel::UserAgent. I've also thought about writing a seperate daemon to act as a proxy of sorts to the mod_perl2 handler, which would do all the downloads multi-threaded. I can't say I'm excited about the time involved in either prospect, but I'll do what it takes to get this working smooth.

Replies are listed 'Best First'.
Re: Parallel HTTP requests under mod_perl2
by perrin (Chancellor) on Mar 08, 2006 at 04:34 UTC

    The threaded model generally has significantly worse performance because of the way Perl threads work. On the mod_perl list, we recommend prefork for those that can use it (i.e. everyone but Win32).

    More likely, LWP::Parallel::UserAgent is just not very fast. I suggest you try using a fast one like HTTP::MHTTP or HTTP::GHTTP and switching to a forking model where you fork (yes, it's okay to fork from mod_perl) and write the responses back to a file or database. A prefork model is also possible but harder to code all the IPC stuff for.

      I'll look into handling the forking myself along with HTTP::MTTP or HTTP::GHTTP. Writing the responses to a file or database really isn't useful for me. The data I'm getting cannot be cached, so I have to make the request every time.

      The reason I was considering the threaded model is that I'm running into memory constraints that seem to be related to prefork (one interpreter per child). This application is heavily bound on network i/o and during my limited testing with threading, I was able to accept a much larger number of requests concurrently.

        I just meant you could use the file as cheap IPC to get the responses in the parent process. There are other ways to do this though.

        Be careful what you measure in terms of memory. On a Linux system, much of the size of an httpd process is actually shared by copy-on-write. See the mod_perl docs for more info.