in reply to Long-running automation tasks

Sorry for being vague, but that was on purpose. I wanted to see what opinions and recommendations there were for event-driven process versus cron.

Seems cron looks like it may be the way to go after all for both maintainability and performance. I just was not sure that starting a process every 3-5 minutes would be the best use of resources.

But, to the specifics, I need to do the following:

  • check a folder for files and process each item in there with a script, period: 5 minutes
  • check a REST API for new items and process them with a script, period: 15 minutes
  • check a REST API for changes in existing items, period: 24 hours
  • check to see whether services are running and email alert if not, period: 5 minutes
  • check for new uploads for a folder and move them to the processing folder above, period: 1 hour
  • compress/archive uploaded JSON/log files, period: 24 hours

    The last item seems more like a monitoring system like nagios, zabbix, etc. but it is only 4 nodes and one tcp service port so easy enough to do on my own.

    Thanks for all of the feedback so far.

  • Replies are listed 'Best First'.
    Re^2: Long-running automation tasks
    by Marshall (Canon) on Nov 30, 2017 at 20:48 UTC
      Maybe you could clarify this odd looking thing?:
      • check a folder for files and process each item in there with a script, period: 5 minutes
      • check for new uploads for a folder and move them to the processing folder above, period: 1 hour
      If you are only moving things into the processing folder once per hour, why the 5 minute rush to look in the folder to process the results? Why not just process things once per hour? I'm sure that there is something that I'm missing, its just that 5 min vs one hour looks odd to me.

      This item is perhaps worthy of some special consideration:

      • check to see whether services are running and email alert if not, period: 5 minutes check to see whether services are running and email alert if not, period: 5 minutes

      Making this special "watch over the system" a server daemon is worth considering. A daemon is basically a continually running process without any console I/F. Typically a process like this needs to maintain some state information about the current system status. Think about how many emails might get sent if some node fails. Often bombarding an email account with a bazillion messages saying the same thing is not productive. Often a "throttle" on repeated messages is desired. If this process is running in memory, you can keep the state information as an in memory table instead of some other method like a disk file. I would write a simple client to talk to this thing, with one command, "status".

      If I am the recipient of 1 or 500 emails from this "watcher process". My actions will be the same. Fire up my client program, check current status and do what I can to get the system running again right now. Investigation of why this happened can take hours or days or even weeks.

      A simple Perl program can run for years without memory leaks, provided that you pay attention to this "simple" part.

      Unix is very efficient compared with Windows in terms of starting new processes. I wouldn't overly concern yourself about that. Except for perhaps this "watcher" program, a chron job looks fine.

      Update:
      I see that you have 2 tasks that involve the REST API. Consider the "Least Common Denominator". It could be that you can also run the "need once per day" query every 15 minutes? Maybe it doesn't matter in terms of performance? If it doesn't then "so what?". There can be something said about simplifying the code at the expense of minuscule performance gains.