nonblocking I/O - testing whether file has more data to read

hv has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: nonblocking I/O - testing whether file has more data to read by kvale (Monsignor) on Apr 08, 2004 at 17:02 UTC
The GNU tail -f program simply loops through the file list, performing an fstat on each file to determine if anything has changed. If you want to reduce the load, consider how often you need a screen update. Every second, every 10 seconds? Then `sleep` for that interval before doing another round of polling. -Mark	[reply] [d/l]
Re: Re: nonblocking I/O - testing whether file has more data to read by hv (Prior) on Apr 08, 2004 at 22:44 UTC
Hmm, interesting. An fstat/sleep loop might well be sufficient. Part of the purpose of this script is to provide similar levels of support to the standard sysadmin tools at the point we replace a single error log for each webserver with lots of little logs, one for each script, so I'd hope to achieve responsiveness similar to `tail -f error_log`. But a second or two's delay is unlikely to be critical, and could certainly help to make the process more cooperative. There are situations in which I'd expect large amounts of input to be filtered down to small amounts of output, but if I calculate the time to sleep from the start of the fstat/read cycle (rather than doing a fixed sleep each time) I can minimise the danger of falling behind. Hugo	[reply] [d/l]
Re: Re: nonblocking I/O - testing whether file has more data to read by hv (Prior) on Apr 12, 2004 at 16:53 UTC
I went for this approach, and it seems to work fine, giving good responsiveness without causing a detectable load on the system. The basic loop is something like this: `my @closed = list_of_files(); my @open = (); my $count = 0; while (1) { my $time = Time::HiRes::time(); check_moved(\@open) if ($count % $CHECKMOVED) == 0; check_new(\@closed) if ($count % $CHECKNEW) == 0; check_read(\@open) if ($count % $CHECKREAD) == 0; ++$count; my $delay = $TICK - (Time::HiRes::time() - $time); Time::HiRes::sleep($delay) if $delay > 0; }` [download] I'm currently using constants $TICK=0.1, $CHECKREAD=1, $CHECKNEW=20, $CHECKMOVED=50, but later I plan to make these more dynamic based on the size of the file list and other details. Checking for new files involves trying to open each file in `@closed`, and if the open succeeds moving it from `@closed` to `@open` and recording its filehandle, the current read position, the file id (device and inode) and the blocksize. Checking for moved files involves doing a `stat` by name for each file in `@open`, and marking the file as `close_pending` if either the `stat` fails or the device/inode has changed. Checking for read involves doing a `stat` by filehandle for each file in `@open`, and reading new data if the filesize is greater than my current file position. Additionally if this file is marked as `close_pending`, I try to lock the file, and if it succeeds (which in this context means all writers to the file have finished) I close the file and move it from `@open` to `@closed`. Thanks to everyone for your help. Hugo	[reply] [d/l] [select]
Re: nonblocking I/O - testing whether file has more data to read by matija (Priest) on Apr 08, 2004 at 19:38 UTC
Hi! I'm glad you looked at File::Tail - that's my contribution. Some thoughts: Select for a normal file (as oposed to a socket or a pipe) will always report the file as ready for reading or writing. Statistical prediction works fairly nicely - most files have a constant rate of input for ranges of some minutes - while the rate changes over the day, it is likely to be much the same as it was 10 minutes ago. (Talking mostly about log files here). File::Tail provides a select-on-steroids: you give it the normal 4 arguments, as well as an array of File::Tail objects. It will handle a mix of sockets/pipes and File::Tail files, reading from multiple files fairly efficiently. (Note that this code has had the least use of all the code in the module, so there may still be some bugs in there). The sleep-maximizing logic in the `select` of File::Tail is overkill if you have a bunch of files and they all get updated a lot (like tens of lines per second each). If the files are fairly busy, I wouldn't want to keep closing and opening them (is that realy what that tail is doing? Wow!). If they aren't very busy, statistical prediction works pretty well, and your monitoring script spends most of it's time sleeping. I haven't tested the select with 64 files - if they are very busy files, I would take a long look at select's code and think if I could optimize it. But it works quite well for 10 fairly busy files. Talk to me by email, and we might figure something out together. Note that File::Tail contains logic meant to find out if the file has been rotated out from under it. For some files this logic might not be needed - I'm thinking of making it possible to turn it off...	[reply]
Re: Re: nonblocking I/O - testing whether file has more data to read by hv (Prior) on Apr 08, 2004 at 22:37 UTC
Hi matija, thanks for the comments. In this case, we're talking about a webserver application in which the logs are split out to separate files on a per-script and per-site basis, and I imagine typical usage would be to tail all logs for a site, for example just after an upgrade, to monitor for new problems that may have arisen. So for most of those files, I'd expect no output at all - and therefore also no data on which to base predictions about the next time the file may be updated. However I can imagine another use would be to turn on verbose debugging for all scripts, then run a filtering monitor over the log files until a particular problem shows itself - in that case I can imagine large quantities of data appearing in all the files. I do need to cope with the files moving from under me, but the best strategy to cope with that is likely to be quite dependent on the strategy for reading from the file. In this case it is also likely to be dependent on the tools doing the moving - it is possible that the files will temporarily be replaced with a pipe, in which case it is important that I don't end up reading from that. In fact that's the main reason I'm wary of simply opening a pipe from `tail(1)`, which otherwise has all the options I need in its GNU incarnation. I think I shall try next with a simple fstat/sleep loop, and see what that does for performance under light and heavy loads. Hugo	[reply] [d/l]
Re: Re: Re: nonblocking I/O - testing whether file has more data to read by matija (Priest) on Apr 09, 2004 at 05:43 UTC
If File::Tail detects no activity in the file, it will check it rarely. You can configure any interval, but I think the default is once per minute. In my opinion, for something like a whole bunch of rarely updated files, File::Tail is the right thing. The code would look somewhat like this: `foreach (keys %files) { push(@tails,File::Tail->new(name=>$_,tail=>0,reset_tail=>0)); } while (1) { ($nfound,$timeleft,@pending)= File::Tail::select(undef,undef,undef,$timeleft,@tails); foreach (@pending) { my $line=$_->read; process($line); } }` [download] This is actualy extracted from a script I use to monitor a whole bunch of logs on fairly active servers - the script calls one or more handlers which parse each particular type of log file, and the aggregate results are sent to a central monitoring machine. I ripped out the complications that deal with monitoring no files (because I colllect other data, too), and the multiple handler dispatch.	[reply] [d/l]
Re: nonblocking I/O - testing whether file has more data to read by Fletch (Bishop) on Apr 08, 2004 at 17:58 UTC
If there's a version of FAM available for your target OS you might look into SGI::FAM which sits on top of the `fam(3X)` routines and has shims in the kernel notify a userland daemon when specified files are changed (and the daemon then passes that info back to your program).	[reply]
Re: Re: nonblocking I/O - testing whether file has more data to read by hv (Prior) on Apr 08, 2004 at 22:32 UTC
Interesting - I noticed this in `/usr/include/bits/fcntl.h` the other day (as one does), and wondered whether any of them might be useful: `#ifdef __USE_GNU # define F_SETLEASE 1024 /* Set a lease. / # define F_GETLEASE 1025 / Enquire what lease is active. / # define F_NOTIFY 1026 / Request notfications on a directory. * +/ #endif` [download] Prompted by your suggestion, I found that these are actually documented in more recent versions of the `fcntl(2)` manpage than I have, such as in this HTML version. Unfortunately it notifies only of processes opening or truncating a file, and you have to be the owner of the file (or root), which means it isn't very useful for my current needs. I see also that there is a fam rpm for my O/S - but while I wasn't able to Google any docs on how to use it, I did find a number of threads about how insecure it was (due to its dependency on portmapper), which makes me reluctant to try it. Hugo	[reply] [d/l] [select]
Re: nonblocking I/O - testing whether file has more data to read by zentara (Cardinal) on Apr 09, 2004 at 16:54 UTC
Oh my head hurts thinking about all those low-level details, and I'm not sure I'm correctly interpreting your problem. But I thought you might like to read perldoc -q filehandle which has some code for dealing with blocking filehandles. I used the following code to test for anything in the pipe, before trying to read it. That way, you read if there is something there, else move on to the next. #see which filehandles have output from perldoc -q filehandle $esize = pack("L", 0); ioctl(\ERROR, FIONREAD(), $esize) or die "Couldn't call ioctl: +$!\n"; $esize = unpack("L", $esize); print "esize-> $esize\n" unless ($esize < 1); $rsize = pack("L", 0); ioctl(\READ, FIONREAD(), $rsize) or die "Couldn't call ioctl: $ +!\n"; $rsize = unpack("L", $rsize); print "rsize-> $rsize\n" unless ($rsize <1); #get the output from bc if($esize > 0){sysread(ERROR,$error,$esize); $errortot = $errorto +t.$error} if($rsize > 0){sysread(READ,$answer,$rsize); $answertot = $answer +tot.$answer} } until(($esize > 0)or(($rsize > 0)and($rsize < 4060))); [download] I'm not really a human, but I play one on earth. flash japh	[reply] [d/l]
Re: Re: nonblocking I/O - testing whether file has more data to read by hv (Prior) on Apr 12, 2004 at 16:24 UTC
Interesting stuff, if painfully low level. However, characteristics of individual `ioctl` calls are device-dependent, and that FAQ mentions lower down: FIONREAD requires a filehandle connected to a stream, meaning that sockets, pipes, and tty devices work, but not files. Since I am reading plain files, that isn't going to work for me. Hugo	[reply] [d/l]