Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

We have a large app built in Catalyst, that has a very high startup overhead, thanks to a large number of database connections that have to be set up at the start of every run. It takes about 30 seconds or so before it starts to respond. It's only the startup that's affected by this; once it's running, there's no memory pressure or CPU stress.

Normally this isn't a big deal; we roll out new versions once every week or three, and deploy it on different boxes in turn, so it's always up. However, we also run a frequent cron script based on the app, and this gets to be a real problem. We run it with different parameters to process different tasks, sometimes every five minutes, and here, a 30-second startup cost gets to be significant. (Once again, after the app loads, the processing itself usually only takes a second or two.) In some cases, we need to run multiple versions in parallel, because some tasks have network delays from dealing with external APIs, so we'll do things like run one script to process odd-number IDs, another to run even-number IDs.

We've looked at some discussions here and on Stack Overflow about daemonizing the script, but it's not really clear how we'd control it the way we do now. That is, currently we might run a dozen versions from cron, with "process task A", "process task B", "process even-number task C's", "process odd-number task C's", etc. This is easy to manage; if we add a "task D", we add another cron job to "process task D"; the tasks that only need to run once a day get a cron job that does them at 3 a.m.; etc.

But communicating with daemons isn't something we have experience with--none of us are systems guys--and trying to write the logic into the script itself seems impossible. Are there guidelines for how to deal with this? Assume there's no way to reduce the startup costs.

  • Comment on Daemonizing (or otherwise speeding up) a high-overhead script?

Replies are listed 'Best First'.
Re: Daemonizing (or otherwise speeding up) a high-overhead script? (Mojolicious References)
by eyepopslikeamosquito (Archbishop) on Aug 24, 2023 at 00:47 UTC
Re: Daemonizing (or otherwise speeding up) a high-overhead script?
by Corion (Patriarch) on Aug 24, 2023 at 07:38 UTC

    There is PPerl, which basically implements a prefork server for arbitrary scripts. The idea is that it launches, does the costly initialization, and then forks into the background. Then, if you ever launch the script again, it will connect to the background server and skip the costly initialization.

    The problem is, that fork and database handles (like all external resources) don't play well together and you have to reinitialize after each fork.

    The other replies already have recommended frameworks like Mojolicious to implement a small server, and I think this is a sound approach. Personally, I would look at using any kind of job queue, be it directory/file based or database based. For example Minion is such a job queue that also has a monitoring UI etc.

    This will mean you split your code into the script/frontend to submit jobs, and a "worker" component that does the costly initialization and then works on submitted jobs. The workers pick jobs from the queue and depending on machine usage etc. you can launch more workers or kill them.

    Update: I have to retract my recommendation of Minion for this situation because it forks a new instance for every job. Forking a new instance for each job means connecting to the database for each single job. In the quick scan I didn't see a way to have one worker process multiple jobs before it quits the program.

      I agree, splitting background tasks into dedicated, small workers with proper job queueing is certainly the way to go.

      In my systems, i have various "tasks to do" tables the worker work on. The workers run all the time, just waiting for new jobs to be scheduled. I also do this for time based scheduling. It's often times better to run the "do something every 5 minutes" stuff internally in the worker, instead of calling it from a cron job. And in many (if not most) cases it's really "once per hour" instead of "at the start of every hour". That way, you can spread out the server load a bit better.

      Whenever i need a worker to react in somewhat of a realtime manner (for example, processing and printing an invoice after the user has finished input), i add an IPC (interprocess communication) "trigger" to start checking the database (or just doing whatever needs to be done) NOW.

      Shameless plug: In my projects i use Net::Clacks for IPC, see also this slightly outdated example: Interprocess messaging with Net::Clacks

      PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
Re: Daemonizing (or otherwise speeding up) a high-overhead script?
by hippo (Archbishop) on Aug 24, 2023 at 10:16 UTC

    Any client/server architecture you fancy will handle this. Long-running server processes to avoid the on-request startup costs and listening for occasional client connections. Since it's already a Catalyst app I don't see why you wouldn't initiate the jobs via web requests. Now, having said that ...

    has a very high startup overhead, thanks to a large number of database connections that have to be set up at the start of every run. It takes about 30 seconds or so before it starts to respond ... Assume there's no way to reduce the startup costs.

    Mounting my high horse I will assert that there is a way to reduce the startup costs, particularly if they are just setting up a large number of database connections. These can be done in parallel and that will save stacks of time because most of the latency will be due to the network and the server side processing of the authnz and setting it all up on the far end. Everyone will have their own yardstick but I expect most will agree that 30 seconds is far too long as an init phase. A little work to refactor that init phase should pay off handsomely.


    🦛

Re: Daemonizing (or otherwise speeding up) a high-overhead script?
by Anonymous Monk on Aug 24, 2023 at 16:16 UTC

    OP here. Many thanks for all the responses. I was going to give more detail for what's actually going on, but in fact the question that "hippo" asked led me straight to the obvious solution to this.

    As they pointed out, the Catalyst app is already running, and that really is the daemon, and all the workers we need. What's happening now is that we implement our own job queue, and the cron script simply gathers data from the job queue and fires requests into the Catalyst app. Currently, the cron script loads the entire app (because it uses some utility functions that are, um, useful), but it doesn't need to--with a small amount of rewriting, the cron script can get the info it needs, without the zillion database connections, and send everything into the main app.

    When I was thinking we needed a separate daemon, I didn't stop to consider why this script itself needed to be daemonized, and in fact, there's nothing in that script that takes much overhead. I was misunderstanding my own app. Wasn't intentionally an XY question, but it seems to have ended up that way.... Thank you!

      I'm not a huge fan of running periodic tasks through the public-facing app, but it works. One of the problems would be if the public hit the app with so many requests that there weren't available workers to serve the cron jobs. Another problem is the potential for hackers who found your source code to issue bogus cron tasks from the public.

      An easy solution is to run another copy of the Catalyst app (or even a second Catalyst app entirely) for handling cron tasks, and not expose it to the public. This also lets you restart them independently, knowing whether you're disrupting the users or crashing a long-running cron task.

        I appreciate your concerns. But it's not a public-facing app; it's purely internal, and it only responds to requests that are from the internal network, and have some appropriate API key for the relevant task. We monitor this pretty closely. It can, of course, still happen that we get more requests than we expect, but we have a fair amount of cushioning built into the system.
      And in fact, it took about 5 lines of changes, and I cut the startup time on our dev box from 19s to .9s. This is awesome!
Re: Daemonizing (or otherwise speeding up) a high-overhead script?
by gnosti (Chaplain) on Aug 23, 2023 at 20:18 UTC
    How do you communicate with your script now? Is there any reason you couldn't use a socket?