Long-running automation tasks

bfdi533 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Long-running automation tasks by Marshall (Canon) on Nov 29, 2017 at 05:04 UTC
I would like to know more about what your Perl task is going to do? In general, for task that runs say once per hour and runs for a few seconds, cron is the way to go. You get a "new instance" and memory "leaks" are not as big of a concern. Very long lived processes require a lot more care. If you have a program that "watches" a directory for new files, there are ways to do this under unix on an event driven basis. Can you provide more info? Your question is so general that I don't know how to proceed.	[reply]
Re: Long-running automation tasks by Your Mother (Archbishop) on Nov 29, 2017 at 17:26 UTC
Other options include TheSchwartz. Redis::JobQueue, and Beam::Minion. There are a couple of others too, I think. That stuff adds quite a lot of administrator load to code and it takes, I argue, an experienced Perl hacker to keep it from creeping into spaghetti. It can be what people need and it certainly unifies reporting and error handling and would probably be the best backend to a web admin system of jobs or user initiated jobs and such but if you already have functional cron scripts you might instead just formalize them. Have them log to the same place. Get them into revision control. Lock down the environment paths and the user that runs them. Rewrite anything too basic or shell-y into Perl. It's up to you of course but consider what you really want to achieve in the end and how much bandwidth/time you have to do it.	[reply]
Re: Long-running automation tasks by karlgoethebier (Abbot) on Nov 29, 2017 at 11:22 UTC
If you want/need to demonize see Daemon::Control. Regards, Karl ŤThe Crux of the Biscuit is the Apostropheť `perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'`Help	[reply] [d/l]
Re: Long-running automation tasks by Anonymous Monk on Nov 29, 2017 at 05:35 UTC
A long-running process that has a specific purpose and scope is usually called a daemon. "Tasks that need to run regularly" sound a lot like a little bit of this and that— strung together as you've characterized it. That's what cron is meant for. Can you approach your problem as a state machine? Does it follow some protocol? If so, you need to outline that protocol and the state machine before you can actually write the implementation.	[reply]
Re: Long-running automation tasks by bfdi533 (Friar) on Nov 30, 2017 at 17:59 UTC
Sorry for being vague, but that was on purpose. I wanted to see what opinions and recommendations there were for event-driven process versus cron. Seems cron looks like it may be the way to go after all for both maintainability and performance. I just was not sure that starting a process every 3-5 minutes would be the best use of resources. But, to the specifics, I need to do the following: check a folder for files and process each item in there with a script, period: 5 minutes check a REST API for new items and process them with a script, period: 15 minutes check a REST API for changes in existing items, period: 24 hours check to see whether services are running and email alert if not, period: 5 minutes check for new uploads for a folder and move them to the processing folder above, period: 1 hour compress/archive uploaded JSON/log files, period: 24 hours The last item seems more like a monitoring system like nagios, zabbix, etc. but it is only 4 nodes and one tcp service port so easy enough to do on my own. Thanks for all of the feedback so far.	[reply]
Re^2: Long-running automation tasks by Marshall (Canon) on Nov 30, 2017 at 20:48 UTC
Maybe you could clarify this odd looking thing?: check a folder for files and process each item in there with a script, period: 5 minutes check for new uploads for a folder and move them to the processing folder above, period: 1 hour If you are only moving things into the processing folder once per hour, why the 5 minute rush to look in the folder to process the results? Why not just process things once per hour? I'm sure that there is something that I'm missing, its just that 5 min vs one hour looks odd to me. This item is perhaps worthy of some special consideration: check to see whether services are running and email alert if not, period: 5 minutes check to see whether services are running and email alert if not, period: 5 minutes Making this special "watch over the system" a server daemon is worth considering. A daemon is basically a continually running process without any console I/F. Typically a process like this needs to maintain some state information about the current system status. Think about how many emails might get sent if some node fails. Often bombarding an email account with a bazillion messages saying the same thing is not productive. Often a "throttle" on repeated messages is desired. If this process is running in memory, you can keep the state information as an in memory table instead of some other method like a disk file. I would write a simple client to talk to this thing, with one command, "status". If I am the recipient of 1 or 500 emails from this "watcher process". My actions will be the same. Fire up my client program, check current status and do what I can to get the system running again right now. Investigation of why this happened can take hours or days or even weeks. A simple Perl program can run for years without memory leaks, provided that you pay attention to this "simple" part. Unix is very efficient compared with Windows in terms of starting new processes. I wouldn't overly concern yourself about that. Except for perhaps this "watcher" program, a chron job looks fine. Update: I see that you have 2 tasks that involve the REST API. Consider the "Least Common Denominator". It could be that you can also run the "need once per day" query every 15 minutes? Maybe it doesn't matter in terms of performance? If it doesn't then "so what?". There can be something said about simplifying the code at the expense of minuscule performance gains.	[reply]
Re: Long-running automation tasks by Anonymous Monk on Nov 29, 2017 at 15:14 UTC
`cron` ...	[reply]