OfficeLinebacker has asked for the wisdom of the Perl Monks concerning the following question:

Hello, fellow monks.

A coworker has set up a novel way to track the execution of cron jobs (which we use extensively in RedHat Enterprise v4).

The desired command's output is directed to a file of the form basename-date:hh:mm.uid.log in a certain directory. Then, every 5 minutes, another cron job (in Perl) scans said directory for all files 5 days old or younger, converts them to HTML, and posts the logs as web pages.

The main page is split up into 5 users, and eash user's page lists the history of all cron jobs run in the past 5 days.

The point is to not get lots of email every day, but still be able to see how your cron jobs went.

It also does some basic error checking.

We're planning on writing some improvements, such as sending an email if the program detects an error, and even checking if the job has "run too long," in which case an email can also be sent.

I'd like to ask The Monks if they know of any other similar implementations before we start improving a wheel that perhaps has already been invented.

TIA

  • Comment on Accessing cron job output from a web page?

Replies are listed 'Best First'.
Re: Accessing cron job output from a web page?
by traveler (Parson) on Apr 19, 2006 at 21:39 UTC
    I did something similar once with log files from a program. Instead of the every 5 minutes cron job, just use a CGI perl script to dynamically scan the directory for files less than 5 days old and show them. It means you only generate the HTML when needed, saving CPU time.
      that's a great point, traveler, and we discussed that, but we worry about the page load time if the logs are particularly large. I guess we'd just have to see if it ever bogged down the server too bad. But I like thinking in the "JIT" mindset!
        I had that thought with one of my pages. I take the middle road. Have the script check the age of the HTML file, and if it's more than X minutes old, recreate it from the log files, otherwise just show it. It saves CPU and time when many people are viewing the page at the same time, but still gives you almost current data.
Re: Accessing cron job output from a web page?
by ptum (Priest) on Apr 19, 2006 at 21:51 UTC

    A more general solution might be to write an event logging module and 'use' it in all your scripts. Then any time something 'happens' you can invoke an API in that event-logging module and log it to a file (or better yet, a relational database). Hey, presto, you've got a decent historical picture of what has been happening. You can write the CGI script that presents the 'current picture' once and make it configurable/filterable for various slices of the event log.

    Or you could stick with your cron log-sweeping idea, especially if you have a lot of shell scripts. The above is just an idea that popped into my head, and may not be well-thought-out. Take it or leave it. :)


    No good deed goes unpunished. -- (attributed to) Oscar Wilde
      I'll leave it, thanks. ;) (Hope that kind of humor is accepted here)

      Some of the scripts invoked by cron are shell scripts. I think about 50% are perl, and the rest csh (I know, I know--that's the default here).

      Also, I'm probably not experienced enough to write a module, and I don't really know what an API even is.

      We do, however, plan to use perl wrappers for all 'interesting' cron jobs, the main purpose of which is to mark the STDERR output to distinguish it visually from the STDOUT on the web page.

      The CGI thing seems more and more intriguing; unfortunately I have much less experience with CGI than regular perl, plus CGI is kind of a pain the way they have it configured here.

      I think I may propose to keep the "sweeping" algorithm for now, but think about upgrading in the future.

      Thanks for any and all comments! :) -Terrence

        Heh. It is certainly accepted by me ... PerlMonks is a dull place if nobody has a sense of humor. SOPW is sort of like an advice buffet -- take as much as you like of whatever suits you. :)

        API simply stands for 'Application Programming Interface' and has to do with the external 'face' or 'handle' that one application presents to another who wants to use its functionality. So (in my hypothetical example) the event logging subsystem would conceal its complexity from its clients and present an easy way to log events by simply instantiating an EventLog object and calling some simple method. The EventLog.pm module might have all kinds of cool and intricate private methods that stored away the events in files and databases, but all that would be 'under the covers' and the scripts that called that method wouldn't have to know about it.

        I like the sound of aufflick's idea, in that it might combine the best of both worlds -- working with shell scripts and having minimal impact on your existing scripts, yet being extensible in terms of parsing and reporting on the event messages once they've been mailed.


        No good deed goes unpunished. -- (attributed to) Oscar Wilde
Re: Accessing cron job output from a web page?
by aufflick (Deacon) on Apr 20, 2006 at 06:20 UTC
    A simple solution would be to make a special email address that you direct all your cron files to (via a MAILTO= directive).

    At it's simplest you could then just have a webmail interface to the box, but that's a bit crummy. You could write a simple script that fetched all the mail and made a web page out of that data (or via a database/cgi, whatever works for you).

    The advantage of doing it this way is that no change is required to the cron jobs other than adding the MAILTO directive at the top of the file. You could also direct the cron outputs from multiple machines to a central server.

Re: Accessing cron job output from a web page?
by jhourcle (Prior) on Apr 20, 2006 at 13:07 UTC

    I think it really depends on what sort of work the cron jobs are doing.

    For instance, I once had a system where people could request accounts to be created / modified / whatever ... and instead of giving the webserver direct access to do things, it wrote files to a queue to be a processed. Once every 5 minutes, the files in the queue were checked, and processed (well, after the cron job checked to make sure the process wasn't still running from the last time it was kicked off)

    As it's annoying to get e-mails every 5 minutes, each of the sub-tasks got logged to a seperate file (eg, new user accounts, new organizational accounts, new user accounts that were forced in by the helpdesk, changes in associations between user & org accounts, modifications to org accounts, etc, etc.). Every hour during the work day, and less frequently at night, another process looked through the log files, and generated a report -- errors first, then other non-normal events (eg, people forcing accounts through, items in the queue that were more than an hour old and hadn't been processed, etc.). If there weren't any changes in that hour, it didn't report anything. (which was typical through most of the year ... it was a university, so most people came in at the change of semesters).

    The only time that the cron job that ran every 5 minutes would report immediately is if something really bad happened. (eg, couldn't connect to the LDAP server or the other databases, or couldn't FTP necessary files to the host where the account was being created, or the process that started 5 minutes before was still active)

      The cron jobs generally deal with downloading or reading financial/economic information, doing stuff with it, loading the modified or subsetted data into another db, and usually creating a chart or graph, distilling it, and posting it to a web site. Generally we're only worried about errors. The non-eroneous output is mainly useful to see what "normal" is when there IS an error. Also, there are several jobs that have dependencies, so we do want to be notified ASAP if there is an error. I guess one thing we need to consider is how to distribute the work between the wrapper script and the web site posting script. If we do CGI, we might need three scripts: the wrapper, the error-checker, and the CGI. The more I think about it, the static, every five minutes approach is probably better. I don't think I want to do much with the wrapper script, as I want to keep that as simple as possible so a minimum of errors are introduced by it. Thanks to all.
Re: Accessing cron job output from a web page?
by OfficeLinebacker (Chaplain) on May 01, 2006 at 21:50 UTC
    Just wanted to let you guys know we went with the periodic script to create the report pages. One thing I would like to improve upon is that each time through, the script re-creates everything that is already there plus whatever is new in the past five minutes. I'd like to have it do incremental updates only, but other than that, the system is working great.

    So to summarize: we have a cron "wrapper" script that executes the desired program and inserts tags into the beginning of the log file indicating the start of the run and whom to email in case of error. It then runs the program, capturing output and tagging STDERR with HTML tags to visually differentiate it from the rest. Then it prints a closing tag.

    The begin and end tags are because the script that creates the html can detect whether the program is still running, and will hold off on doing anything unless it sees the special end run sequence.

    One other feature we plan to build in is a tag at the beginning (along with the emails, perhaps right after) that tells the report program how long to wait until it decides that the program has run for too long without a closing tag, and thus send an email alerting the owner.