woodstea has asked for the wisdom of the Perl Monks concerning the following question:

Earlier today I was searching the site for a way to emulate the UN*X tail command in Perl, specifically, tailing just the last line of a huge log file. I spent some time reading Simulating UNIX's "tail" in core Perl, and after some thought, I realize that for my specific application (and given who else might be maintaining the program now and in the future), that simply running the system tail command is probably the best way to go. Easy, fast - it's fine really.

I can't help wondering, though - just for educational reasons - what the most efficient way to get that last line is. I've got to think that while'ing through every line can't be right, especially for very large files. I'd think using seek might be better, so I put together a little program to seek backwards from the next-to-last character of the file to the first newline encountered, and print everything following. Here's what I've got:

use strict; open(FILE, $ARGV[0]) or die; my($i,$char); for ($i=-2;;$i--) { seek(FILE,$i,2) or exit; read(FILE,$char,1); $char eq "\n" and last; } while (<FILE>) { print; } close FILE;

This is my first encounter with seek (and my first post here), so I'm looking for feedback. Does this make sense? More efficient ways to do this? Pitfalls?

Thanks, Rob

Replies are listed 'Best First'.
Re: tail -1 emulation efficiency
by BrowserUk (Patriarch) on May 06, 2004 at 03:52 UTC

    Rather than seeking to the end and then reading backwards a byte at a time, it might be quicker to read a reasonable (greater than line length) sized chunk from the end of the file and then let the regex engine locate the last line.

    #! perl -slw use strict; our $MAX ||= 512; open TAIL, '< :raw', $ARGV[ 0 ] or die "$ARGV[ 0 ] : $!"; sysseek( TAIL, -$MAX, 2 ); sysread( TAIL, my $buffer, $MAX ); my( $line ) = $buffer =~ m[\n(.*?$)]; print $line;

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
Re: tail -1 emulation efficiency
by coec (Chaplain) on May 06, 2004 at 02:33 UTC
      Well, File::Tail seemed to be mostly about an emulation of the "tail -f" idea, and I couldn't see in my first perusal how to just do the last line. Perhaps I'll have to look at the module's code itself. It doesn't seem like the POD docs are very clear on that point.
        and I couldn't see in my first perusal how to just do the last line.

        Quote simply: You can tell File::Tail to start reading at N lines from the end, then just let the object pass out of the scope:

        sub get_last_line { my $name=shift @_; my $file=File::Tail->new(name=>$name,tail=>1); return $file->read; }
        The finding of the tail is fairly efficient: File::Tail grabs a chunk from the end of file, and counts the newlines in the chunk. If it has enough to satisfy the request, it returns the data. If it doesn't have enough, it calculates the average length of line in what it already has, then multiplies the average with the number of lines it still needs, and tries again. This repeats until enough lines are in the buffer.

        The effect is that it should get the required number of lines with very few reads even if the line length distribution is strange.

        I see I will have to rewrite the POD for File::Tail though - this is the second case I heard of someone wanting just a few lines from the end, and not finding that described in the docs.

Re: tail -1 emulation efficiency
by gsiems (Deacon) on May 06, 2004 at 02:58 UTC
    Or using a while loop ;-)
    use strict; my $char = ''; open(FILE, $ARGV[0]) or die; seek(FILE, -1, 2) or exit; while ($char ne "\n") { seek(FILE, -2, 1) or exit; read(FILE, $char, 1); } print <FILE>; close FILE;
Re: tail -1 emulation efficiency
by eserte (Deacon) on May 06, 2004 at 09:25 UTC
Re: tail -1 emulation efficiency (use File::ReadBackwards)
by grinder (Bishop) on May 06, 2004 at 10:17 UTC

    I needed more or less the same thing some time ago. I settled on using File::ReadBackwards which did the job nicely. If you're happy with using a module for the job it's just the ticket.

Re: tail -1 emulation efficiency
by zentara (Cardinal) on May 06, 2004 at 16:13 UTC
    You might be interested in this benchmark. It dosn't do exactly what you want, but it demonstrates the huge speed differences.
    #!/usr/bin/perl use Benchmark; use File::ReadBackwards; use strict; my $numlines =10; my $filename = 'talz.dat'; #some big file timethese(1000, { #################################################### filereadbackwards => sub { my @lines; my $line; my $count=0; my $bw = File::ReadBackwards->new($filename) or die "can't read filename $!" ; while(defined($line = $bw->readline)){ push @lines,$line ; last if ++$count >= $numlines; } @lines= reverse @lines; }, ##################################################### tailz1 => sub { my $chunk = 400 * $numlines; #assume a <= 400 char line(generous) # Open the file in read mode open FILE, "<$filename" or die "Couldn't open $filename: $!"; my $filesize = -s FILE; if($chunk >= $filesize){$chunk = $filesize} seek FILE,-$chunk,2; #get last chunk of bytes my @tail = <FILE>; if($numlines >= $#tail +1){$numlines = $#tail +1} splice @tail, 0, @tail - $numlines; }, });

    I'm not really a human, but I play one on earth. flash japh
Re: tail -1 emulation efficiency
by blue_cowdawg (Monsignor) on May 06, 2004 at 02:48 UTC

    How about this:

    use Tie::File; use strict; my @ry=(); tie @ry,"Tie::File","myfile" or die "$!"; my $line=$ry[$#ry]; untie @ry; print $line;

      I'm afraid neither of the tie examples, this one or the one above, compare very well in terms of performance with the seek solution:
      Using tail module w/ tie: 0.200u 0.050s 0:00.32
      First example w/ tie: 0.270u 0.030s 0:00.38
      My example using seek: 0.030u 0.010s 0:00.07
      
      I'm happy to have the examples though, just learned about tie tonight -- thanks.