SituationSoap has asked for the wisdom of the Perl Monks concerning the following question:

I have a script which is set to run 24/7. Every day, it will create a new logfile of actions performed that day. This is for reporting and tuning purposes, to make sure the script is running at maximum efficiency. The question I have is outlined in the code below:
use strict; use warnings; use Class::Date qw(:errors date localdate gmdate now -DateParse -EnvC) +; my $start = date now; #start date of the program running in format YYY +Y-MM-DD HH:MM:SS my @split = split(/ /, $start); #get just the Year/Month/Day open (FILE, ">>/my/directory/logfile-$split[0]"); #open our daily logf +ile print FILE "$start\n"; sub dailyTask{ my $now = date now; #Get our current time my @split = split(/ /, $now); close FILE; #close the existing daily file, in production this wil +l zip it into a running archive of daily files open (FILE, ">>/my/directory/logfile-$split[0]"); #open our new da +ily file } while(1){ my $test = date now; #date when we start the loop print "Start: $start\n"; #print our values, so I can see print "Test: $test\n"; my $soon = $start + '1D'; #Adds 1 day onto the start date print "Soon: $soon\n"; print FILE "$test\n"; if($soon < $test){#Check if I've passed our start + 1 day print "It's soon, now!\n"; #If so, print our value $start = $test; #update the new start time dailyTask(); } sleep(86400); #sleep until tomorrow }
Essentially, what I'm not sure if is whether or not the <FILE> file handle will (a) be passed to the called subroutine, and (b) whether the newly defined file handle will persist outside the scope of the subroutine. What is marked above is the essence of what I'm trying to accomplish (obviously with all the additional logic stripped out, for readability). Does anyone have any suggestions, or should the code work as written? Thank you.

Replies are listed 'Best First'.
Re: Scope of Filehandle inside Subroutine?
by mr_mischief (Monsignor) on Jul 31, 2009 at 20:19 UTC
    It's preferable with a modern Perl to use lexical file handles and three-argument open. Once you're doing things that way, you won't need to remember different scoping rules for bareword typeglob file handle.
      Thank you for the response. I'd seen some hints at lexical file handles, but I hadn't seen anything written on the three-argument open. This looks like it will be the easiest way to integrate the logging functionality with the existing code.
Re: Scope of Filehandle inside Subroutine?
by jwkrahn (Abbot) on Jul 31, 2009 at 23:00 UTC

    The variable FILE (which is short for main::FILE in your example) is a package variable and is therefore in scope thoughout the example you provided.   As long as it is opened before it is used then it should work as expected.

Re: Scope of Filehandle inside Subroutine?
by Gangabass (Vicar) on Aug 01, 2009 at 01:33 UTC

    Also it's highly recommended to check open result. Like this:

    open my $fh, ">>", "/my/directory/logfile-" . $split[0] or die $!; #op +en our new daily file
      Thanks for the reminder. The full service-level code does this, but the short mock-up which I wrote for this post does not, because my goal for it was an example which would show code relevant to the question and only the question for this post.
Re: Scope of Filehandle inside Subroutine?
by ikegami (Patriarch) on Aug 04, 2009 at 03:59 UTC

    File handles aren't scoped. The variables which which they are associated are.

    If the file handle is in a lexical variable, the file handle is lexically scoped.

    If the file handle is in a package variable, the file handle is not scoped. The unqualified name of the variable is pacakge-scoped, but the variable doesn't cease to exist when executing code from another package. It's even accessible from other packages.

    Using lexical variables is much preferred over using global variables.

Re: Scope of Filehandle inside Subroutine?
by Marshall (Canon) on Aug 02, 2009 at 01:55 UTC
    Update: I know this post is long. I interpreted the question as: "how do I setup a logging mechanism for: a)single long lived process or b) multiple long lived processes that want to log to a common file and where that common log file changes periodically (like once per day). I know that this is a different question than the OP asked. But it is related and appears to me to drive more to the point of how to make this 24/7 application successful.

    I am curious about a few aspects of your application. There are some subtle things can trip things up in this sort of logging situation.

    Looking at your code so far, you have a process that doesn't do much each day. Its not the thing that is writing to the log file because it is sleeping! So who are the other writer's? And what is the plan for them when the daily logfile change-over happens?

    Anyway you have a process here that ties up a filehandle for not much reason (it sleeping most of the time). Another way is to just use the process scheduler function of your OS (Windows, *nix or whatever) to run some cleanup thing once per day (ie your log file archiver).

    The total number of filehandles is a fixed number for the whole system. There is no need for you to take one of these permanently out of circulation just for this process. Your process will also consume other system resources (like a PID), etc. All together this is small, but nevertheless non-zero. Doing this with a "scheduled" approach, allows simplification of your code too. That fact that code is running means it is time to make a new log file. Often there is a provision for multiple log files on same day. Another approach is described below which I think is better for you...

    I did a quick search on CPAN for "log" and there are a bunch of modules. I haven't used any of them yet so I don't have a recommendation, but it is likely that there is some module out there that will help you with the trickier aspects of this. I don't know that for sure, that's just a guess on my part.

    It is important to understand that a filehandle is not like a "short-cut" for a filename! When you open a file, the directory part of file system, produces a set of bits that to the file system is essentially a pointer to a set of bits on the disk. Once a file is "open" and you have a filehandle, the original name in the directory is irrelevant! Filehandles are system resources and if Process A opens file "x" and Process B opens file "x", they have distinct filehandles.

    So if Process A opens a file and is reading it. Let's say that Process B comes along and unlinks that file, which if it has appropriate authority it can do. The name of the file disappears from the directory. Does this delete the bits on the disk? No. Does this affect Process A? No. When Process A closes the file handle, the O/S will realize that this set of bits no longer has a name in the directory and at that point the "bits on disk" are freed (deleted) because there is no way for this "set of bits" on the disk to be accessed again.

    In terms of your logging application, one way to do this is to have a module that is the "logger", and the other parts of code just say "log this" and don't have a file handle or even know which file "this" will get logged to. The log module figures out where to send things. In lots of situations, a simple check on the integer value of gmtime and a compare is enough to decide whether to start a new log file or not. There are some performance considerations here, but in general I/O is so expensive compared to a low level O/S function like this, that this approach often works great. In this case your code would just ask before every I/O, "Is it time to change things yet?". A 32 bit int compare is really fast as well as the OS func that gets those 32 bits of "unix epoch time". I suspect that if you just have one very long lived process, this is the way to go.

    If you have a situation where there are multiple processes that want to log things to the same file, things get more complex. If you have something like this, then you either need a way for each process to be informed that it needs to get a new file handle (a re-open to the new CurrentLog) or you have a "log server". In this case, the application is writing/reading from a socket to the log server instead of to the file and log server fiddles with things like in the more simple case above (ie each process has a constant socket handle to deal with log server rather than a filehandle)

    Just a few other recommendations:
    - I would use some fixed standard name for the current log file, like "CurrentLog" or whatever instead of a name that changes every day because then you can have utilties, perhaps even just "tail" that operate on the current log file without having to search for its current name. Change the name when you move it to the archive.
    - I usually use GMT (UTC) instead of local time for files like this because this side steps this DST stuff.
    - Oh, "-" isn't a valid file name char on most systems so I would change that to "_"(underscore) and also eliminate any spaces (space works on Windows, but not on Unix).

    My objective was not to confuse the issue for you, but rather to point out some of the pitfalls with very long lived processes writing to an ever changing log file name (stale file handle problem). I suspect that having a separate process to change filenames (like you have in your post) is not going to work out as well as other strategies even if this is just one other single process that your code is dealing with.

    The comments about using a lexical my $var are right. The reason I took the time to write the above is that I suspect that scoping of FILE vs my $file is not the main problem that is going to crop up.

      When it comes to logging, I more and more tend to delegate it, like I do already for data storage: I put all data into an RDBMS, use DBI, and don't have to care any longer about locking, concurrency and all that stuff. The RBDMS does that for me.

      For logging, syslog is a common approach, but it is too "common". Each and every program writes into syslog, leaving a big heap of junk that has to be filtered manually for information.

      So, each application better uses its own log. Log rotation can be solved differently. There are several log rotating utilities that send signals and rename files, hoping that no data is lost and the service stays available during the transition. Apache comes with a rotatelogs utility that is used in a pipe, copying its STDIN to several timestamped files in a given directory. Unfortunately, the exact filenames are hard to predict, because they depend on the exact start time of rotatelogs.

      djb's multilog program can do even more: It can automatically add timestamps, it automatically rotates the log files while keeping a constant name for the current log file, it can limit the number of old log files, and it can filter the log data. Together with other parts of the daemontools, writing a background service is no longer harder than writing a simple command line application. supervise takes care of starting, stopping, and restarting the application when needed, and all you need for logging is a simple write operation to STDERR (or maybe STDOUT), like warn, die, or print STDERR. (See the djb way for details.)

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        Great post!

        the OP's main problem is not what is scope of this or that (although that is a completely valid and relevant question). The main issue appears to be: how to do log rotation in a way that doesn't loose data and will run reliably 24/7. I think your suggestions are on target.