genecutl has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a little server that tails a file and listens to socket connections to which it relays info about that file. I've been playing around with doing non-blocking reads from both the filehandle and the socket, using IO::Select. Or rather, "semi-blocking" in that I want don't want to block on any particular handle, but I do want to block when there's nothing on any handle to keep CPU usage down.

So, I push my handles into the IO::Select object and supposedly both IO::Select->can_read() and IO::Select->select($handles, undef, undef, 0) are supposed to block until any one of the handles has some data, e.g.:

can_read ( [ TIMEOUT ] ) Return an array of handles that are ready for reading. "TIMEOU +T" is the maximum amount of time to wait before returning an empty l +ist, in seconds, possibly fractional. If "TIMEOUT" is not given and + any handles are registered then the call will block.

But when I try this, it doesn't block on the filehandle, and keeps looping with no new data in the file. Here is my sample minimal code where I am only working with the file handle and no TCP sockets:

#!/usr/bin/perl use IO::Select; open $fh, '/tmp/test_file'; $sel = new IO::Select( $fh ); while(@ready = $sel->can_read) { print "looping\n"; foreach $h (@ready) { if ($h == $fh) { print "reading from fh\n"; while ($buf = <$h>) { print $loop++," $buf"; } } else { print "handle is unknown\n"; } } }

This tails my file just fine, but it doesn't block when the file is not changing. Any suggestions?

(I do know about File::Tail, in case anyone was going to suggest that, but I couldn't see how to implement it non-blocking and integrate that with the socket listener.)

Thanks.

Replies are listed 'Best First'.
Re: blocking, non-blocking, and semi-blocking
by etcshadow (Priest) on Aug 29, 2004 at 05:37 UTC
    You're misunderstanding what can_read (or the analogous call to select) means. They don't exactly mean "which handles have waiting data"... they mean "for which handles will a read return imediately". So, for a file, a read will always return imediately, so a file will always return from can_read. Even if you have read to the end of the file, reading from it again will just return you no data, but it will do so without blocking.

    Select is more about direct inter-process communication (like sockets and pipes), not inderect interprocess communication (by writing and reading from files).

    I know you don't want to use it, but what you really seem to be talking about is the same sort of thing that File::FindTail does, which is to poll a file for changes in length every so often. To work that into what you're doing, just do your IO::Select call with a brief timeout (like 1 second or so), and check to see if the file has changed it's length in the interim. If so, seek to it's previous length, and read from there.

    Update: added to the last paragraph a little so that it doesn't just say what is wrong but also says what you should do.

    ------------ :Wq Not an editor command: Wq
Re: blocking, non-blocking, and semi-blocking
by Zaxo (Archbishop) on Aug 29, 2004 at 06:25 UTC

    A handle to a disk file is always ready to read until eof. Writes to the file by other processes after you have opened it will not be seen on your handle. If the writers are trying to obtain exclusive locks on the file, holding your handle open is plain antisocial.

    You can watch and tail a file with seek,tell and the -s file test. You haven't shown what you're doing with your socket handles, so I'll skip that, too.

    use IO::File; use IO::Socket; use IO::Select; my $file = '/tmp/test_file'; my $sel = IO::Select->new; my $pos = -s $file; my $fh = IO::File->new; # read the file a first time if you like . . . # Add IO::Socket objects to $sel if ($pos < -s $file) { $fh->open($file, 'r') or die $!; $fh->seek($pos, 0); $sel->add($fh); } { my @ready = $sel->can_read; for (@ready) { if ($_ eq $fh) { print <$fh>; $sel->remove($fh); $pos = $fh->tell; $fh->close(); } else { # do socket things } } if ($pos < -s $file) { $fh->open($file, 'r') or die $!; $fh->seek($pos, 0); $sel->add($fh); } last if end_condition(); # for some end condition redo; }
    Completely untested. I've altered your while loop to an unconditional one. Your code will quit listening the first time it catches up with traffic.

    After Compline,
    Zaxo

Re: blocking, non-blocking, and semi-blocking
by matija (Priest) on Aug 29, 2004 at 09:16 UTC
    (I do know about File::Tail, in case anyone was going to suggest that, but I couldn't see how to implement it non-blocking and integrate that with the socket listener.)

    Actually, File::Tail has a select for that very purpose. The way to use it is to pass all the socket filehandles to File::Tail's select exactly the way you'd pass them to the regular select.

    Then you take all the File::Tail objects, and you put them behind all the other parameters in the select. Select will return when any of the filehandles are ready for reading, and File::Tail's select will do exactly the same, except it will also return if any of the File::Tail objects have stuff to read.

    foreach (@ARGV) { push(@files,File::Tail->new(name=>"$_",debug=>$debug)); } while (1) { ($nfound,$timeleft,@pending)= File::Tail::select(undef,undef,undef,60,@files); foreach (@pending) { print $_->{"input"}." (".localtime(time).") ".$_->read; } }
    So, to use File::Tail for this, just replace the undefs in the select call with the appropriate bit vectors for your socket filehandles.
Re: blocking, non-blocking, and semi-blocking
by sfink (Deacon) on Aug 29, 2004 at 19:03 UTC
    I generally cheat for things like this: if you're on Unix, just run tail -f filename through a pipe and select on the resulting fileno. Here's a snippet from wikimon, a script I wrote to IM me whenever someone edits a page on my work wiki:
    # Recent changes log my $RC_FILE = "$TOPDIR/wikidb/rclog"; open(TAIL, "tail -f -n 0 $RC_FILE |"); my $rin = ''; vec($rin, fileno(TAIL), 1) = 1; while (1) { # Check for file input select(my $rout = $rin, undef, undef, undef); if (vec($rout, fileno(TAIL), 1)) { ...; } }
    Actually, the original uses a select timeout of zero, because I'm also waiting for IM commands to come in, so I do a nonblocking poll and then a timed wait on the IM stuff (it doesn't expose file descriptors to select on).
      Thanks for all your replies. I was able to get it implemented using sfink's /bin/tail trick:

      (No, not quite. See my follow up post below.)

      #!/usr/bin/perl use IO::Select; use IO::Socket; use POSIX; local $| = 1; $timeout = 0.1; #### setup the file.... open $tail, "/usr/bin/tail -f -n 0 /tmp/test_file |"; my $rin = ''; vec($rin, fileno($tail), 1) = 1; fcntl($tail, F_SETFL(), O_NONBLOCK()); ###### setup the socket.... $main_sock = new IO::Socket::INET ( LocalHost => 'localhost', LocalPort => 4321, Listen => 5, Proto => 'tcp', ReuseAddr => 1, ); $readable_handles = new IO::Select(); $readable_handles->add($main_sock); while (1) { ######### do tail stuff here... print "selecting\n"; $nfound = select($rout = $rin, undef, undef, $timeout); while ($ret = sysread $tail, $buf, 1024) { print $count++," $buf\n"; } ######### do TCP socket stuff.... print "checking sockets\n"; ($new_readable) = IO::Select->select($readable_handles, undef, undef, $timeout); foreach $sock (@$new_readable) { if ($sock == $main_sock) { print "new connection\n"; $new_sock = $sock->accept(); $readable_handles->add($new_sock); } else { print "reading socket\n"; while ($ret = sysread $sock, $buf, 1024) { print $count++, " $buf\n"; } print "closing socket\n"; $readable_handles->remove($sock); close $sock; } } }
        Actually, it turns out that didn't work. The file tail pipe would stop working with `ps` showing "tail [defunct]". I tried closing that pipe and then opening a new one when the sysread returned undef, but the new pipe wouldn't give me any output. I found another file tailing trick in the Perl Cookbook which now verifiably works for me. Here is the code:
        use IO::Select; use IO::Socket; use POSIX; ##### setup the file.... # open it open my $tail, $FILE or die $!; # set to non-blocking fcntl( $tail, F_SETFL(), O_NONBLOCK() ); # [ fill up the file info buffer ] ###### setup the socket.... + $main_sock; $readable_handles; $main_sock = new IO::Socket::INET( LocalHost => 'localhost', LocalPort => $PORT, Listen => 8, Proto => 'tcp', ReuseAddr => 1, ); $readable_handles = new IO::Select(); $readable_handles->add($main_sock); while (1) { ######### do file tail stuff here... + # unset the EOF (as seen in Perl Cookbook) $tail->clearerr(); # read any new lines if ( @lines = (<$tail>) ) { # [ add @lines to buffer ] # [ discard old lines from buffer ] # [ other processing of data... ] } ######### do TCP socket stuff.... + ($new_readable) = IO::Select->select( $readable_handles, undef, un +def, $TIMEOUT ); foreach my $sock (@$new_readable) { if ( $sock == $main_sock ) { $new_sock = $sock->accept(); $new_sock->autoflush(1); $readable_handles->add($new_sock); } else { if ( $ret = sysread $sock, $buf, 4 ) { # [ process the received info, generate $reply ] # reply to the client + print $sock $reply; } # close the client + $readable_handles->remove($sock); close $sock; } } }
        I can't explain the behavior you're observing, but the code you posted is incorrect. You are separating out the two selects, which means it won't check for socket output until it gets tail output, and once it does, it won't check for tail output until it gets socket output. You need to combine them together, and when select returns you have to figure out whether it's the pipe from tail or the socket that has data available.

        As for the problem of your tail dying, I suggest you sprinkle salt on it once a week and be careful how you sit down. No, wait, that's wrong. I meant to say that I suspect the tail would fail even if you ran it from the command line -- either because that file doesn't exist, or because your version of tail doesn't support the command-line arguments you're passing it. If you want to tail -f a file that might not exist yet, use tail -F (for GNU tail, at least). If I'm wrong and the cause of death is more mysterious, try either changing "tail -f -n 0 /tmp/test_file |" to "tail -f -n 0 /tmp/test_file 2>&1 | tee /tmp/unhappy.log |", or to "strace -o /tmp/unhappier.log tail -f -n 0 /tmp/test_file". Examine the generated log file to diagnose the problem.

        Once you merge the two selects, you can dispense with the timeout too.