tail -1 emulation efficiency

woodstea has asked for the wisdom of the Perl Monks concerning the following question:

Earlier today I was searching the site for a way to emulate the UN*X tail command in Perl, specifically, tailing just the last line of a huge log file. I spent some time reading Simulating UNIX's "tail" in core Perl, and after some thought, I realize that for my specific application (and given who else might be maintaining the program now and in the future), that simply running the system tail command is probably the best way to go. Easy, fast - it's fine really.

I can't help wondering, though - just for educational reasons - what the most efficient way to get that last line is. I've got to think that while'ing through every line can't be right, especially for very large files. I'd think using seek might be better, so I put together a little program to seek backwards from the next-to-last character of the file to the first newline encountered, and print everything following. Here's what I've got:

use strict;

open(FILE, $ARGV[0]) or die;
my($i,$char);
for ($i=-2;;$i--) {
    seek(FILE,$i,2) or exit;
    read(FILE,$char,1);
    $char eq "\n" and last;
}
while (<FILE>) {
    print;
}
close FILE;
[download]

This is my first encounter with seek (and my first post here), so I'm looking for feedback. Does this make sense? More efficient ways to do this? Pitfalls?

Thanks, Rob

Comment on tail -1 emulation efficiency Download Code

Replies are listed 'Best First'.
Re: tail -1 emulation efficiency by BrowserUk (Patriarch) on May 06, 2004 at 03:52 UTC
Rather than seeking to the end and then reading backwards a byte at a time, it might be quicker to read a reasonable (greater than line length) sized chunk from the end of the file and then let the regex engine locate the last line. `#! perl -slw use strict; our $MAX \|\|= 512; open TAIL, '< :raw', $ARGV[ 0 ] or die "$ARGV[ 0 ] : $!"; sysseek( TAIL, -$MAX, 2 ); sysread( TAIL, my $buffer, $MAX ); my( $line ) = $buffer =~ m[\n(.*?$)]; print $line;` [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l]
Re: tail -1 emulation efficiency by coec (Chaplain) on May 06, 2004 at 02:33 UTC
Have you looked at File::Tail? http://search.cpan.org/~mgrabnar/File-Tail-0.98/Tail.pm Update `use File::Tail; my $ref=tie *FH,"File::Tail",(name=>$name, tail=>1); print "$_";` [download] I don't have this module installed so this is untested. CC	[reply] [d/l]
Re: Re: tail -1 emulation efficiency by woodstea (Sexton) on May 06, 2004 at 02:50 UTC
Well, File::Tail seemed to be mostly about an emulation of the "tail -f" idea, and I couldn't see in my first perusal how to just do the last line. Perhaps I'll have to look at the module's code itself. It doesn't seem like the POD docs are very clear on that point.	[reply]
Re: Re: Re: tail -1 emulation efficiency by matija (Priest) on May 06, 2004 at 04:43 UTC
and I couldn't see in my first perusal how to just do the last line. Quote simply: You can tell File::Tail to start reading at N lines from the end, then just let the object pass out of the scope: `sub get_last_line { my $name=shift @_; my $file=File::Tail->new(name=>$name,tail=>1); return $file->read; }` [download] The finding of the tail is fairly efficient: File::Tail grabs a chunk from the end of file, and counts the newlines in the chunk. If it has enough to satisfy the request, it returns the data. If it doesn't have enough, it calculates the average length of line in what it already has, then multiplies the average with the number of lines it still needs, and tries again. This repeats until enough lines are in the buffer. The effect is that it should get the required number of lines with very few reads even if the line length distribution is strange. I see I will have to rewrite the POD for File::Tail though - this is the second case I heard of someone wanting just a few lines from the end, and not finding that described in the docs.	[reply] [d/l]
Re: tail -1 emulation efficiency by gsiems (Deacon) on May 06, 2004 at 02:58 UTC
Or using a while loop ;-) `use strict; my $char = ''; open(FILE, $ARGV[0]) or die; seek(FILE, -1, 2) or exit; while ($char ne "\n") { seek(FILE, -2, 1) or exit; read(FILE, $char, 1); } print <FILE>; close FILE;` [download]	[reply] [d/l]
Re: tail -1 emulation efficiency by eserte (Deacon) on May 06, 2004 at 09:25 UTC
Try the File:ReadBackwards module.	[reply]
Re: tail -1 emulation efficiency (use File::ReadBackwards) by grinder (Bishop) on May 06, 2004 at 10:17 UTC
I needed more or less the same thing some time ago. I settled on using File::ReadBackwards which did the job nicely. If you're happy with using a module for the job it's just the ticket.	[reply]
Re: tail -1 emulation efficiency by zentara (Cardinal) on May 06, 2004 at 16:13 UTC
You might be interested in this benchmark. It dosn't do exactly what you want, but it demonstrates the huge speed differences. #!/usr/bin/perl use Benchmark; use File::ReadBackwards; use strict; my $numlines =10; my $filename = 'talz.dat'; #some big file timethese(1000, { #################################################### filereadbackwards => sub { my @lines; my $line; my $count=0; my $bw = File::ReadBackwards->new($filename) or die "can't read filename $!" ; while(defined($line = $bw->readline)){ push @lines,$line ; last if ++$count >= $numlines; } @lines= reverse @lines; }, ##################################################### tailz1 => sub { my $chunk = 400 * $numlines; #assume a <= 400 char line(generous) # Open the file in read mode open FILE, "<$filename" or die "Couldn't open $filename: $!"; my $filesize = -s FILE; if($chunk >= $filesize){$chunk = $filesize} seek FILE,-$chunk,2; #get last chunk of bytes my @tail = <FILE>; if($numlines >= $#tail +1){$numlines = $#tail +1} splice @tail, 0, @tail - $numlines; }, }); [download] I'm not really a human, but I play one on earth. flash japh	[reply] [d/l]
Re: tail -1 emulation efficiency by blue_cowdawg (Monsignor) on May 06, 2004 at 02:48 UTC
How about this: `use Tie::File; use strict; my @ry=(); tie @ry,"Tie::File","myfile" or die "$!"; my $line=$ry[$#ry]; untie @ry; print $line;` [download]	[reply] [d/l]
Re: Re: tail -1 emulation efficiency by woodstea (Sexton) on May 06, 2004 at 03:11 UTC
I'm afraid neither of the tie examples, this one or the one above, compare very well in terms of performance with the seek solution: Using tail module w/ tie: 0.200u 0.050s 0:00.32 First example w/ tie: 0.270u 0.030s 0:00.38 My example using seek: 0.030u 0.010s 0:00.07 I'm happy to have the examples though, just learned about tie tonight -- thanks.	[reply]