Using pipe and reading from the stdin

mellin has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Using pipe and reading from the stdin
by jasonk (Parson) on Oct 26, 2006 at 14:18 UTC

The big problem I would have with something like that, is that it requires you to remember (and type every time you want to use it) which arguments you need to provide to tail. You can do the same thing in perl fairly easily using something like File::Tail, File::Tail::App, or just plain seek.

For example, I would guess your readStdin.pl looks something like this:

while( my $line = <STDIN> ) {
    # do some stuff with $line
}
[download]

Using File::Tail::App, you can very easily build the tail part into your application, like so:

use File::Tail::App;

tail_app({
    new          => shift,
    line_handler => sub {
        my ( $line ) = @_;
        # do some stuff with $line
    },
});
[download]

We're not surrounded, we're in a target-rich environment!

[reply]
[d/l]
[select]

Re^2: Using pipe and reading from the stdin

by mellin (Scribe) on Oct 26, 2006 at 14:25 UTC

Ok, i'll look into that File::Tail. When i started, I was thinking of building that tail functionality to the script itself but decided otherwise for no good reason. But like usually, i shouldn't create the already invented wheel, so i'll take a look into that existing package. Thanks

[reply]

Re: Using pipe and reading from the stdin
by merlyn (Sage) on Oct 26, 2006 at 14:15 UTC

File::Tail

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

[reply]

Re: Using pipe and reading from the stdin
by talexb (Chancellor) on Oct 26, 2006 at 14:25 UTC

I use something like that every day -- I have a script called notch.pl that outputs a blank line and the time after a new line of output arrives if it's a different minute that the previous line's time.

Here's the code:

#!/usr/bin/perl -w

#  Notch out time when lines from 'tail -f' arrive so that lines
#  from different minutes are separated by a timestamp.

use strict;

{
    my $lastMinute = (localtime)[1];

    while(<>) {

        my $thisLine = $_;
        my $thisMinute = (localtime)[1];
        if ( $thisMinute != $lastMinute ) {

          $lastMinute = $thisMinute;
          print "\n";
          print scalar localtime;
          print "\n";
        }
        print $thisLine;
    }
}
[download]

File::Tail

merlyn

~~readh~~

read

~~head~~

hard

Based on the amount of traffic that I see, I don't worry too much about it using an extra pipe or process.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Update Correct two typos .. the first of which I made again, and had to go back and fix again.

[reply]
[d/l]
[select]

Re: Using pipe and reading from the stdin (buffering)
by tye (Sage) on Oct 26, 2006 at 15:29 UTC

I think this arrangement would "suffer from buffer"ing. Pipes aren't "character special" devices so isatty(1) will be false so the C RTL will fully buffer tail's STDOUT. So even though tail rather quickly writes to the pipe each new line that gets appended to the file, the C RTL won't bother to flush that line to the pipe until several KB of data has been written so readStdin.pl will only get lines in spurts, probably getting partial lines in most cases because the buffer boundary usually won't line up with the line boundaries (but Perl won't bother to return the partial lines until the rest of the line arrives because <STDIN> will wait until it reads a $/ or EOF).

So that all adds up to what could be significant delays in processing added lines. If the log goes for hours without any new lines being appended, then some previously appended line(s) will likely not get processed until hours after appended.

This problem with buffering and pipes has been a thorn for Unix for many, many years. It is unfortunately that a near-trivial solution like defining an evironment variable to override the default buffering used by a process has not been widely implemented. The nature of the problem leads to rather ugly "solutions" like writing your own replacement for tail or using a complex set-up with pseudo TTYs or such. You are lucky that you have File::Tail to use here.

Since this problem seems a particularly bad problem for something like "tail -f", I did check a local copy of "man tail" and found no mention of special handling of output buffering. Indeed, my testing on a nearby Linux box showed the problem is as I suspected. Interestingly, the same testing using cygwin's tail under Win32 did not have this problem, though I'm not sure why at this point¹. Perhaps Win32's C RTL never fully buffers, only line buffering? If so, I like that feature but it is the first time I've noticed it.

¹ Just another way that Win32 is superior to Unix, I guess :)

- tye

[reply]
[d/l]

Re^2: Using pipe and reading from the stdin (buffering)

by quester (Vicar) on Oct 27, 2006 at 09:24 UTC

$| = 1;
[download]

use English '-no_match_vars';
$OUTPUT_AUTOFLUSH = 1;
[download]

Some other common utilities will work in pipelines, but some insist on buffering their output. Which utilities are which? It's probably simpler to find out by experiment than to find the documentation, if it exists.

If you find that grep or cut on your system are holding their output in buffers, just rewrite them as perl scripts.

[reply]
[d/l]
[select]

Re^3: Using pipe and reading from the stdin (buffering)

by tye (Sage) on Oct 27, 2006 at 15:29 UTC

Of course, $|= 1; won't do the slightest good here. There is nothing Perl can do to unbuffer the output of the tail command that it is reading from (just to be clear).

We are lucky that Perl often makes "writing your own replacement" relatively easy. You are lucky that you've "never had to do too much to work around it". Rewriting every command in the pipeline in Perl can be a lot of work or may not be possible.

Most of the time we are lucky in that the output flows fairly continuously until EOF and so the buffering at most introduces a slight delay (and, in theory, makes the process go faster, though I'm suspicious of such pre-optimization thinking and suspect you'd be hard pressed to notice an efficiency difference between line buffering and full buffering).

It is still rather silly and sad that such a simple feature that would be trivial to add a tiny bit of external control to and has caused so much grief over the previous decades hasn't been addressed. If I already had my fingers in GNU source code, I'd certainly submit a patch to the trivial:

    setvbuf( STDOUT, NULL, isatty(1) ? _IOLBF : _IOFBF, BUFSIZE );
[download]

To do something more like:

    {
        char* bufPref= getenv( "CRTL_STDOUT_BUF_MODE" );
        int   bufMode=
            NULL == bufPref
                ? ( isatty(1) ? _IOLBF : _IOFBF )
                : "L"==*bufPref ? _IOLBF : "F"==*bufPref ? _IOFBF : _I
+ONBF;
        }
        setvbuf( STDOUT, NULL, bufMode, BUFSIZE );
    }
[download]

- tye

[reply]
[d/l]
[select]

Re: Using pipe and reading from the stdin
by doowah2004 (Monk) on Oct 26, 2006 at 14:32 UTC

[reply]

Re^2: Using pipe and reading from the stdin

by mellin (Scribe) on Oct 26, 2006 at 14:51 UTC

Yes, exactly what i mean. Is there something happening that i am unaware of?

Far as i understand, keeping tail in the "-f" mode doesn't keep the lines in the memory space nor does perl using while <STDIN> do that, so it should only be a matter of printing a line through external program and seeing if we have another line to print again.

[reply]