in reply to Scope of Filehandle inside Subroutine?
I am curious about a few aspects of your application. There are some subtle things can trip things up in this sort of logging situation.
Looking at your code so far, you have a process that doesn't do much each day. Its not the thing that is writing to the log file because it is sleeping! So who are the other writer's? And what is the plan for them when the daily logfile change-over happens?
Anyway you have a process here that ties up a filehandle for not much reason (it sleeping most of the time). Another way is to just use the process scheduler function of your OS (Windows, *nix or whatever) to run some cleanup thing once per day (ie your log file archiver).
The total number of filehandles is a fixed number for the whole system. There is no need for you to take one of these permanently out of circulation just for this process. Your process will also consume other system resources (like a PID), etc. All together this is small, but nevertheless non-zero. Doing this with a "scheduled" approach, allows simplification of your code too. That fact that code is running means it is time to make a new log file. Often there is a provision for multiple log files on same day. Another approach is described below which I think is better for you...
I did a quick search on CPAN for "log" and there are a bunch of modules. I haven't used any of them yet so I don't have a recommendation, but it is likely that there is some module out there that will help you with the trickier aspects of this. I don't know that for sure, that's just a guess on my part.
It is important to understand that a filehandle is not like a "short-cut" for a filename! When you open a file, the directory part of file system, produces a set of bits that to the file system is essentially a pointer to a set of bits on the disk. Once a file is "open" and you have a filehandle, the original name in the directory is irrelevant! Filehandles are system resources and if Process A opens file "x" and Process B opens file "x", they have distinct filehandles.
So if Process A opens a file and is reading it. Let's say that Process B comes along and unlinks that file, which if it has appropriate authority it can do. The name of the file disappears from the directory. Does this delete the bits on the disk? No. Does this affect Process A? No. When Process A closes the file handle, the O/S will realize that this set of bits no longer has a name in the directory and at that point the "bits on disk" are freed (deleted) because there is no way for this "set of bits" on the disk to be accessed again.
In terms of your logging application, one way to do this is to have a module that is the "logger", and the other parts of code just say "log this" and don't have a file handle or even know which file "this" will get logged to. The log module figures out where to send things. In lots of situations, a simple check on the integer value of gmtime and a compare is enough to decide whether to start a new log file or not. There are some performance considerations here, but in general I/O is so expensive compared to a low level O/S function like this, that this approach often works great. In this case your code would just ask before every I/O, "Is it time to change things yet?". A 32 bit int compare is really fast as well as the OS func that gets those 32 bits of "unix epoch time". I suspect that if you just have one very long lived process, this is the way to go.
If you have a situation where there are multiple processes that want to log things to the same file, things get more complex. If you have something like this, then you either need a way for each process to be informed that it needs to get a new file handle (a re-open to the new CurrentLog) or you have a "log server". In this case, the application is writing/reading from a socket to the log server instead of to the file and log server fiddles with things like in the more simple case above (ie each process has a constant socket handle to deal with log server rather than a filehandle)
Just a few other recommendations:
- I would use some fixed standard name for the current log
file, like "CurrentLog" or whatever instead of a name that
changes every day because then you can have utilties, perhaps even
just "tail" that operate on the current log file without having to
search for its current name.
Change the name when you move it to the archive.
- I usually use GMT (UTC) instead of local time for files like this
because this side steps this DST stuff.
- Oh, "-" isn't a valid file name char on most systems so I would
change that to "_"(underscore) and also eliminate any spaces (space works
on Windows, but not on Unix).
My objective was not to confuse the issue for you, but rather to point out some of the pitfalls with very long lived processes writing to an ever changing log file name (stale file handle problem). I suspect that having a separate process to change filenames (like you have in your post) is not going to work out as well as other strategies even if this is just one other single process that your code is dealing with.
The comments about using a lexical my $var are right. The reason I took the time to write the above is that I suspect that scoping of FILE vs my $file is not the main problem that is going to crop up.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Scope of Filehandle inside Subroutine?
by afoken (Chancellor) on Aug 02, 2009 at 17:20 UTC | |
by Marshall (Canon) on Aug 05, 2009 at 00:17 UTC |