in reply to Using pipe and reading from the stdin

I think this arrangement would "suffer from buffer"ing. Pipes aren't "character special" devices so isatty(1) will be false so the C RTL will fully buffer tail's STDOUT. So even though tail rather quickly writes to the pipe each new line that gets appended to the file, the C RTL won't bother to flush that line to the pipe until several KB of data has been written so readStdin.pl will only get lines in spurts, probably getting partial lines in most cases because the buffer boundary usually won't line up with the line boundaries (but Perl won't bother to return the partial lines until the rest of the line arrives because <STDIN> will wait until it reads a $/ or EOF).

So that all adds up to what could be significant delays in processing added lines. If the log goes for hours without any new lines being appended, then some previously appended line(s) will likely not get processed until hours after appended.

This problem with buffering and pipes has been a thorn for Unix for many, many years. It is unfortunately that a near-trivial solution like defining an evironment variable to override the default buffering used by a process has not been widely implemented. The nature of the problem leads to rather ugly "solutions" like writing your own replacement for tail or using a complex set-up with pseudo TTYs or such. You are lucky that you have File::Tail to use here.

Since this problem seems a particularly bad problem for something like "tail -f", I did check a local copy of "man tail" and found no mention of special handling of output buffering. Indeed, my testing on a nearby Linux box showed the problem is as I suspected. Interestingly, the same testing using cygwin's tail under Win32 did not have this problem, though I'm not sure why at this point1. Perhaps Win32's C RTL never fully buffers, only line buffering? If so, I like that feature but it is the first time I've noticed it.

1 Just another way that Win32 is superior to Unix, I guess :)

- tye        

Replies are listed 'Best First'.
Re^2: Using pipe and reading from the stdin (buffering)
by quester (Vicar) on Oct 27, 2006 at 09:24 UTC
    I've run into that too, but I've never had to do too much to work around it...
    $| = 1;
    or more readably
    use English '-no_match_vars'; $OUTPUT_AUTOFLUSH = 1;
    will solve the problem for a pipe that has only Perl scripts in it. It needs to go on the stdout of each script in the pipe. (My experience so far has always been that the stdin side of a pipe will read the next line when the line feed arrive, but you may want to check that with your operating system...and your Perl version, and your C libraries...)

    Some other common utilities will work in pipelines, but some insist on buffering their output. Which utilities are which? It's probably simpler to find out by experiment than to find the documentation, if it exists.

    If you find that grep or cut on your system are holding their output in buffers, just rewrite them as perl scripts.

      Of course, $|= 1; won't do the slightest good here. There is nothing Perl can do to unbuffer the output of the tail command that it is reading from (just to be clear).

      We are lucky that Perl often makes "writing your own replacement" relatively easy. You are lucky that you've "never had to do too much to work around it". Rewriting every command in the pipeline in Perl can be a lot of work or may not be possible.

      Most of the time we are lucky in that the output flows fairly continuously until EOF and so the buffering at most introduces a slight delay (and, in theory, makes the process go faster, though I'm suspicious of such pre-optimization thinking and suspect you'd be hard pressed to notice an efficiency difference between line buffering and full buffering).

      It is still rather silly and sad that such a simple feature that would be trivial to add a tiny bit of external control to and has caused so much grief over the previous decades hasn't been addressed. If I already had my fingers in GNU source code, I'd certainly submit a patch to the trivial:

      setvbuf( STDOUT, NULL, isatty(1) ? _IOLBF : _IOFBF, BUFSIZE );

      To do something more like:

      { char* bufPref= getenv( "CRTL_STDOUT_BUF_MODE" ); int bufMode= NULL == bufPref ? ( isatty(1) ? _IOLBF : _IOFBF ) : "L"==*bufPref ? _IOLBF : "F"==*bufPref ? _IOFBF : _I +ONBF; } setvbuf( STDOUT, NULL, bufMode, BUFSIZE ); }

      - tye