Re: Scope of Filehandle inside Subroutine?

Update: I know this post is long. I interpreted the question as: "how do I setup a logging mechanism for: a)single long lived process or b) multiple long lived processes that want to log to a common file and where that common log file changes periodically (like once per day). I know that this is a different question than the OP asked. But it is related and appears to me to drive more to the point of how to make this 24/7 application successful.

I am curious about a few aspects of your application. There are some subtle things can trip things up in this sort of logging situation.

Looking at your code so far, you have a process that doesn't do much each day. Its not the thing that is writing to the log file because it is sleeping! So who are the other writer's? And what is the plan for them when the daily logfile change-over happens?

Anyway you have a process here that ties up a filehandle for not much reason (it sleeping most of the time). Another way is to just use the process scheduler function of your OS (Windows, *nix or whatever) to run some cleanup thing once per day (ie your log file archiver).

The total number of filehandles is a fixed number for the whole system. There is no need for you to take one of these permanently out of circulation just for this process. Your process will also consume other system resources (like a PID), etc. All together this is small, but nevertheless non-zero. Doing this with a "scheduled" approach, allows simplification of your code too. That fact that code is running means it is time to make a new log file. Often there is a provision for multiple log files on same day. Another approach is described below which I think is better for you...

I did a quick search on CPAN for "log" and there are a bunch of modules. I haven't used any of them yet so I don't have a recommendation, but it is likely that there is some module out there that will help you with the trickier aspects of this. I don't know that for sure, that's just a guess on my part.

It is important to understand that a filehandle is not like a "short-cut" for a filename! When you open a file, the directory part of file system, produces a set of bits that to the file system is essentially a pointer to a set of bits on the disk. Once a file is "open" and you have a filehandle, the original name in the directory is irrelevant! Filehandles are system resources and if Process A opens file "x" and Process B opens file "x", they have distinct filehandles.

So if Process A opens a file and is reading it. Let's say that Process B comes along and unlinks that file, which if it has appropriate authority it can do. The name of the file disappears from the directory. Does this delete the bits on the disk? No. Does this affect Process A? No. When Process A closes the file handle, the O/S will realize that this set of bits no longer has a name in the directory and at that point the "bits on disk" are freed (deleted) because there is no way for this "set of bits" on the disk to be accessed again.

In terms of your logging application, one way to do this is to have a module that is the "logger", and the other parts of code just say "log this" and don't have a file handle or even know which file "this" will get logged to. The log module figures out where to send things. In lots of situations, a simple check on the integer value of gmtime and a compare is enough to decide whether to start a new log file or not. There are some performance considerations here, but in general I/O is so expensive compared to a low level O/S function like this, that this approach often works great. In this case your code would just ask before every I/O, "Is it time to change things yet?". A 32 bit int compare is really fast as well as the OS func that gets those 32 bits of "unix epoch time". I suspect that if you just have one very long lived process, this is the way to go.

If you have a situation where there are multiple processes that want to log things to the same file, things get more complex. If you have something like this, then you either need a way for each process to be informed that it needs to get a new file handle (a re-open to the new CurrentLog) or you have a "log server". In this case, the application is writing/reading from a socket to the log server instead of to the file and log server fiddles with things like in the more simple case above (ie each process has a constant socket handle to deal with log server rather than a filehandle)

Just a few other recommendations:
- I would use some fixed standard name for the current log file, like "CurrentLog" or whatever instead of a name that changes every day because then you can have utilties, perhaps even just "tail" that operate on the current log file without having to search for its current name. Change the name when you move it to the archive.
- I usually use GMT (UTC) instead of local time for files like this because this side steps this DST stuff.
- Oh, "-" isn't a valid file name char on most systems so I would change that to "_"(underscore) and also eliminate any spaces (space works on Windows, but not on Unix).

My objective was not to confuse the issue for you, but rather to point out some of the pitfalls with very long lived processes writing to an ever changing log file name (stale file handle problem). I suspect that having a separate process to change filenames (like you have in your post) is not going to work out as well as other strategies even if this is just one other single process that your code is dealing with.

The comments about using a lexical my $var are right. The reason I took the time to write the above is that I suspect that scoping of FILE vs my $file is not the main problem that is going to crop up.

Comment on Re: Scope of Filehandle inside Subroutine?

Replies are listed 'Best First'.
Re^2: Scope of Filehandle inside Subroutine? by afoken (Chancellor) on Aug 02, 2009 at 17:20 UTC
When it comes to logging, I more and more tend to delegate it, like I do already for data storage: I put all data into an RDBMS, `use DBI`, and don't have to care any longer about locking, concurrency and all that stuff. The RBDMS does that for me. For logging, syslog is a common approach, but it is too "common". Each and every program writes into syslog, leaving a big heap of junk that has to be filtered manually for information. So, each application better uses its own log. Log rotation can be solved differently. There are several log rotating utilities that send signals and rename files, hoping that no data is lost and the service stays available during the transition. Apache comes with a rotatelogs utility that is used in a pipe, copying its STDIN to several timestamped files in a given directory. Unfortunately, the exact filenames are hard to predict, because they depend on the exact start time of rotatelogs. djb's multilog program can do even more: It can automatically add timestamps, it automatically rotates the log files while keeping a constant name for the current log file, it can limit the number of old log files, and it can filter the log data. Together with other parts of the daemontools, writing a background service is no longer harder than writing a simple command line application. supervise takes care of starting, stopping, and restarting the application when needed, and all you need for logging is a simple write operation to STDERR (or maybe STDOUT), like `warn`, `die`, or `print STDERR`. (See the djb way for details.) Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l] [select]
Re^3: Scope of Filehandle inside Subroutine? by Marshall (Canon) on Aug 05, 2009 at 00:17 UTC
Great post! the OP's main problem is not what is scope of this or that (although that is a completely valid and relevant question). The main issue appears to be: how to do log rotation in a way that doesn't loose data and will run reliably 24/7. I think your suggestions are on target.	[reply]