mellin has asked for the wisdom of the Perl Monks concerning the following question:

I have a question, just for curiosity reasons, is there any drawbacks of monitoring server and a text file (e.g Apache log file) using tail and pipe to output the lines to perl script? I want to do it like this so i can control the formatting with the Perl script as i want it to be.

tail -f -n 0 /etc/httpd/logs/access_log | ./readStdin.pl

So the above command is what i run from my Linux www server, the tail command stays real-time in reading the access_log and since that readStdin.pl script is taking input using while (<STDIN>), it waits "indefinitely" for new lines to be formatted and displayed.

I'm just curious if there is some major problem with this kind of monitoring that i haven't even thought about.

Please, comment freely.

Replies are listed 'Best First'.
Re: Using pipe and reading from the stdin
by jasonk (Parson) on Oct 26, 2006 at 14:18 UTC

    The big problem I would have with something like that, is that it requires you to remember (and type every time you want to use it) which arguments you need to provide to tail. You can do the same thing in perl fairly easily using something like File::Tail, File::Tail::App, or just plain seek.

    For example, I would guess your readStdin.pl looks something like this:

    while( my $line = <STDIN> ) { # do some stuff with $line }

    Using File::Tail::App, you can very easily build the tail part into your application, like so:

    use File::Tail::App; tail_app({ new => shift, line_handler => sub { my ( $line ) = @_; # do some stuff with $line }, });

    We're not surrounded, we're in a target-rich environment!

      Ok, i'll look into that File::Tail. When i started, I was thinking of building that tail functionality to the script itself but decided otherwise for no good reason. But like usually, i shouldn't create the already invented wheel, so i'll take a look into that existing package. Thanks

Re: Using pipe and reading from the stdin
by merlyn (Sage) on Oct 26, 2006 at 14:15 UTC
Re: Using pipe and reading from the stdin
by talexb (Chancellor) on Oct 26, 2006 at 14:25 UTC

    I use something like that every day -- I have a script called notch.pl that outputs a blank line and the time after a new line of output arrives if it's a different minute that the previous line's time.

    Here's the code:

    #!/usr/bin/perl -w # Notch out time when lines from 'tail -f' arrive so that lines # from different minutes are separated by a timestamp. use strict; { my $lastMinute = (localtime)[1]; while(<>) { my $thisLine = $_; my $thisMinute = (localtime)[1]; if ( $thisMinute != $lastMinute ) { $lastMinute = $thisMinute; print "\n"; print scalar localtime; print "\n"; } print $thisLine; } }
    I probably could use File::Tail as merlyn suggests but just threw it together because trying to readh read the tail of a log file is head hard enough -- this just breaks it up a little.

    Based on the amount of traffic that I see, I don't worry too much about it using an extra pipe or process.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

    Update Correct two typos .. the first of which I made again, and had to go back and fix again.

Re: Using pipe and reading from the stdin (buffering)
by tye (Sage) on Oct 26, 2006 at 15:29 UTC

    I think this arrangement would "suffer from buffer"ing. Pipes aren't "character special" devices so isatty(1) will be false so the C RTL will fully buffer tail's STDOUT. So even though tail rather quickly writes to the pipe each new line that gets appended to the file, the C RTL won't bother to flush that line to the pipe until several KB of data has been written so readStdin.pl will only get lines in spurts, probably getting partial lines in most cases because the buffer boundary usually won't line up with the line boundaries (but Perl won't bother to return the partial lines until the rest of the line arrives because <STDIN> will wait until it reads a $/ or EOF).

    So that all adds up to what could be significant delays in processing added lines. If the log goes for hours without any new lines being appended, then some previously appended line(s) will likely not get processed until hours after appended.

    This problem with buffering and pipes has been a thorn for Unix for many, many years. It is unfortunately that a near-trivial solution like defining an evironment variable to override the default buffering used by a process has not been widely implemented. The nature of the problem leads to rather ugly "solutions" like writing your own replacement for tail or using a complex set-up with pseudo TTYs or such. You are lucky that you have File::Tail to use here.

    Since this problem seems a particularly bad problem for something like "tail -f", I did check a local copy of "man tail" and found no mention of special handling of output buffering. Indeed, my testing on a nearby Linux box showed the problem is as I suspected. Interestingly, the same testing using cygwin's tail under Win32 did not have this problem, though I'm not sure why at this point1. Perhaps Win32's C RTL never fully buffers, only line buffering? If so, I like that feature but it is the first time I've noticed it.

    1 Just another way that Win32 is superior to Unix, I guess :)

    - tye        

      I've run into that too, but I've never had to do too much to work around it...
      $| = 1;
      or more readably
      use English '-no_match_vars'; $OUTPUT_AUTOFLUSH = 1;
      will solve the problem for a pipe that has only Perl scripts in it. It needs to go on the stdout of each script in the pipe. (My experience so far has always been that the stdin side of a pipe will read the next line when the line feed arrive, but you may want to check that with your operating system...and your Perl version, and your C libraries...)

      Some other common utilities will work in pipelines, but some insist on buffering their output. Which utilities are which? It's probably simpler to find out by experiment than to find the documentation, if it exists.

      If you find that grep or cut on your system are holding their output in buffers, just rewrite them as perl scripts.

        Of course, $|= 1; won't do the slightest good here. There is nothing Perl can do to unbuffer the output of the tail command that it is reading from (just to be clear).

        We are lucky that Perl often makes "writing your own replacement" relatively easy. You are lucky that you've "never had to do too much to work around it". Rewriting every command in the pipeline in Perl can be a lot of work or may not be possible.

        Most of the time we are lucky in that the output flows fairly continuously until EOF and so the buffering at most introduces a slight delay (and, in theory, makes the process go faster, though I'm suspicious of such pre-optimization thinking and suspect you'd be hard pressed to notice an efficiency difference between line buffering and full buffering).

        It is still rather silly and sad that such a simple feature that would be trivial to add a tiny bit of external control to and has caused so much grief over the previous decades hasn't been addressed. If I already had my fingers in GNU source code, I'd certainly submit a patch to the trivial:

        setvbuf( STDOUT, NULL, isatty(1) ? _IOLBF : _IOFBF, BUFSIZE );

        To do something more like:

        { char* bufPref= getenv( "CRTL_STDOUT_BUF_MODE" ); int bufMode= NULL == bufPref ? ( isatty(1) ? _IOLBF : _IOFBF ) : "L"==*bufPref ? _IOLBF : "F"==*bufPref ? _IOFBF : _I +ONBF; } setvbuf( STDOUT, NULL, bufMode, BUFSIZE ); }

        - tye        

Re: Using pipe and reading from the stdin
by doowah2004 (Monk) on Oct 26, 2006 at 14:32 UTC
    I know when I was using a combination of Win32::Gui and Win32::Ole and I would do lots of file crunching/number crunching, opening and closing lots of excel spreadsheets and so-on it seemed like I had memory leaks and general instabilities that I never fully tracked down.

    I think that this is the question that you are asking? Is there a problem having a perl script that runs indefinitely, will there be be in pitfals to doing it, memory leaks, zombie processes, etc...? For the task that you are doing my inclination is no, you should be fine, but I do not have nearly the knowledge base or experience as many of the others here. So hopefully someone else can concur, or if any, point out the potential problems.

      Yes, exactly what i mean. Is there something happening that i am unaware of?

      Far as i understand, keeping tail in the "-f" mode doesn't keep the lines in the memory space nor does perl using while <STDIN> do that, so it should only be a matter of printing a line through external program and seeing if we have another line to print again.