bfdi533 has asked for the wisdom of the Perl Monks concerning the following question:

I have a need for an automation system that will process a queue of things including files in a directory and tasks that need to run regularly. I have already started on a framework that is based on scripts that run from cron but am not really satisfied with it since it feels like it is just strung together bit and piece.

I am posting here for several reasons:

  • I am intending to write my task in PERL
  • I know there are lots of opinions and experience here

    I have found a couple of modules/frameworks which I do not fully understand but which look like appropriate framework:

  • POE
  • Minion (Mojolicious)

    Any guidance here would be appreciated as well as the merits of cron v. perl task that just runs, sleeps, processes, sleeps, processes, etc.

    Thanks in advance.

  • Replies are listed 'Best First'.
    Re: Long-running automation tasks
    by Marshall (Canon) on Nov 29, 2017 at 05:04 UTC
      I would like to know more about what your Perl task is going to do?

      In general, for task that runs say once per hour and runs for a few seconds, cron is the way to go. You get a "new instance" and memory "leaks" are not as big of a concern. Very long lived processes require a lot more care.

      If you have a program that "watches" a directory for new files, there are ways to do this under unix on an event driven basis.

      Can you provide more info? Your question is so general that I don't know how to proceed.

    Re: Long-running automation tasks
    by Your Mother (Archbishop) on Nov 29, 2017 at 17:26 UTC

      Other options include TheSchwartz. Redis::JobQueue, and Beam::Minion. There are a couple of others too, I think.

      That stuff adds quite a lot of administrator load to code and it takes, I argue, an experienced Perl hacker to keep it from creeping into spaghetti. It can be what people need and it certainly unifies reporting and error handling and would probably be the best backend to a web admin system of jobs or user initiated jobs and such but if you already have functional cron scripts you might instead just formalize them. Have them log to the same place. Get them into revision control. Lock down the environment paths and the user that runs them. Rewrite anything too basic or shell-y into Perl. It's up to you of course but consider what you really want to achieve in the end and how much bandwidth/time you have to do it.

    Re: Long-running automation tasks
    by karlgoethebier (Abbot) on Nov 29, 2017 at 11:22 UTC

      If you want/need to demonize see Daemon::Control. Regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

      perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

    Re: Long-running automation tasks
    by Anonymous Monk on Nov 29, 2017 at 05:35 UTC

      A long-running process that has a specific purpose and scope is usually called a daemon.

      "Tasks that need to run regularly" sound a lot like a little bit of this and that— strung together as you've characterized it. That's what cron is meant for.

      Can you approach your problem as a state machine? Does it follow some protocol? If so, you need to outline that protocol and the state machine before you can actually write the implementation.

    Re: Long-running automation tasks
    by bfdi533 (Friar) on Nov 30, 2017 at 17:59 UTC

      Sorry for being vague, but that was on purpose. I wanted to see what opinions and recommendations there were for event-driven process versus cron.

      Seems cron looks like it may be the way to go after all for both maintainability and performance. I just was not sure that starting a process every 3-5 minutes would be the best use of resources.

      But, to the specifics, I need to do the following:

    • check a folder for files and process each item in there with a script, period: 5 minutes
    • check a REST API for new items and process them with a script, period: 15 minutes
    • check a REST API for changes in existing items, period: 24 hours
    • check to see whether services are running and email alert if not, period: 5 minutes
    • check for new uploads for a folder and move them to the processing folder above, period: 1 hour
    • compress/archive uploaded JSON/log files, period: 24 hours

      The last item seems more like a monitoring system like nagios, zabbix, etc. but it is only 4 nodes and one tcp service port so easy enough to do on my own.

      Thanks for all of the feedback so far.

        Maybe you could clarify this odd looking thing?:
        • check a folder for files and process each item in there with a script, period: 5 minutes
        • check for new uploads for a folder and move them to the processing folder above, period: 1 hour
        If you are only moving things into the processing folder once per hour, why the 5 minute rush to look in the folder to process the results? Why not just process things once per hour? I'm sure that there is something that I'm missing, its just that 5 min vs one hour looks odd to me.

        This item is perhaps worthy of some special consideration:

        • check to see whether services are running and email alert if not, period: 5 minutes check to see whether services are running and email alert if not, period: 5 minutes

        Making this special "watch over the system" a server daemon is worth considering. A daemon is basically a continually running process without any console I/F. Typically a process like this needs to maintain some state information about the current system status. Think about how many emails might get sent if some node fails. Often bombarding an email account with a bazillion messages saying the same thing is not productive. Often a "throttle" on repeated messages is desired. If this process is running in memory, you can keep the state information as an in memory table instead of some other method like a disk file. I would write a simple client to talk to this thing, with one command, "status".

        If I am the recipient of 1 or 500 emails from this "watcher process". My actions will be the same. Fire up my client program, check current status and do what I can to get the system running again right now. Investigation of why this happened can take hours or days or even weeks.

        A simple Perl program can run for years without memory leaks, provided that you pay attention to this "simple" part.

        Unix is very efficient compared with Windows in terms of starting new processes. I wouldn't overly concern yourself about that. Except for perhaps this "watcher" program, a chron job looks fine.

        Update:
        I see that you have 2 tasks that involve the REST API. Consider the "Least Common Denominator". It could be that you can also run the "need once per day" query every 15 minutes? Maybe it doesn't matter in terms of performance? If it doesn't then "so what?". There can be something said about simplifying the code at the expense of minuscule performance gains.

    Re: Long-running automation tasks
    by Anonymous Monk on Nov 29, 2017 at 15:14 UTC
      cron ...