Re: Last N lines from file (tail)

File::ReadBackwards has similar, but not quite the same, functionality. Rather than seeking and reading one character at a time, though, it'd be more efficient in general to read larger chunks, like File::ReadBackwards does. Perhaps like so...and autovivifing filehandles is only compatible with 5.6+, so I changed that also to be backward compatible:

sub lastn {
        my($file, $lines, $bufsiz)=@_;
        $bufsiz ||= 1024;
        # Changed FH to STDOUT to avoid warning
        my $fh = \do { local *STDOUT };
        $lines++;
        if (! open($fh, $file) ) {
                print "Can't open $file: $!<P/>";
                return;
        }
        binmode($fh);
        my $pos = sysseek($fh, 0, 2);  # Seek to end
        my $nlcount=0;
        while($nlcount<=$lines) {
                $bufsiz = $pos if $bufsiz > $pos;
                $pos = sysseek($fh, -$bufsiz, 1);
                die "Bad seek: $!" unless defined $pos;
                my $bytes = sysread($fh, $_, $bufsiz, 0);
                die "Bad read: $!" unless defined $bytes;
                $nlcount += tr/\n//;
                $pos = sysseek($fh, -$bufsiz, 1);
                die "Bad seek: $!" unless defined $pos;
                last if $pos == 0;
        }
        seek($fh, sysseek($fh, 0, 1), 0) || warn;
        <$fh> for $lines..$nlcount;
        $fh;
}
[download]

Update: It does work when requesting more lines than the file contains, though I fixed it to work with various buffer sizes. I don't think a 20000% speed improvement (for tailing 400 10 byte lines) is 'too much' optimization :-) It does miscount if the last line does not have a line feed, though do you actually count that as a line or not? Besides, your's 'miscounts' in that situation also.

Comment on Re: Last N lines from file (tail) Download Code

Replies are listed 'Best First'.
Re: Re: Last N lines from file (tail) by clintp (Curate) on Dec 19, 2001 at 01:49 UTC
Except that this doesn't work. Try reading the last N lines from a file with N-5 lines, or a file with < $bufsiz bytes. :) I had a version that used buffers and was a virtual clone of the algorithm in tail.c, except that I got lost and frustrated in the boundary conditions and really didn't care anymore. Laziness and impatience. If you want to take a stab at doing this right, be my guest. I just don't want to do the requisite testing, because the test conditions are yucky: File of L lines reading: L lines L+l lines L-l lines 0 lines Where bufsiz: < size of the file > size of the file Some even multiple of the size of the file Some even multiple of the size of the file less some portion of bufsz == size of the file Basically all of the combinations of these. I got all but the last two coded with nice buffering action. After consideration, I figured I'd let the OS worry about buffering and JFDI. As a matter of fact, if you use getc()instead of sysread() (and seek instead of sysseek, etc..) the STDIO package would take care of most of this buffering nonsense anyway. `sub lastn { my($file, $lines)=@_; my $fh; $lines++; if (! open($fh, $file) ) { print "Can't open $file: $!"; return; } binmode($fh); seek($fh, 0, 2); # Seek to end my $nlcount=0; while($nlcount<$lines) { last unless seek($fh, -1, 1); $_=getc($fh); die unless defined $_; $nlcount++ if ( $_ eq "\n"); last if $nlcount==$lines; last unless (seek($fh, -1, 1)); } $fh; }` [download] There is such as thing as too much optimizing. :) Update: with example.	[reply] [d/l]
Re: Re: Re: Last N lines from file (tail) by SpongeBob (Novice) on Dec 22, 2001 at 02:02 UTC
One character at a time is still slow, as my benchmarks below showed. This solution still benchmarked at about 4 cps, and my buffered solution gave about 1100 cps. Update: BTW, have you looked at File::Tail? It searches from the end of the file also, and if you don't want 'tail -f' behavior (i.e. a blocking read), then you can do: `my $fh = File::Tail->new(name=>$filename,tail=>$lines); $fh->nowait(1); print $line while $line=$line->read;` [download] The performance is not horrible on large files, though a bit worse than my function, probably due to the overhead having all sorts of bells and whistles that are not being used.	[reply] [d/l]