in reply to Last N lines from file (tail)

File::ReadBackwards has similar, but not quite the same, functionality. Rather than seeking and reading one character at a time, though, it'd be more efficient in general to read larger chunks, like File::ReadBackwards does. Perhaps like so...and autovivifing filehandles is only compatible with 5.6+, so I changed that also to be backward compatible:
sub lastn { my($file, $lines, $bufsiz)=@_; $bufsiz ||= 1024; # Changed FH to STDOUT to avoid warning my $fh = \do { local *STDOUT }; $lines++; if (! open($fh, $file) ) { print "Can't open $file: $!<P/>"; return; } binmode($fh); my $pos = sysseek($fh, 0, 2); # Seek to end my $nlcount=0; while($nlcount<=$lines) { $bufsiz = $pos if $bufsiz > $pos; $pos = sysseek($fh, -$bufsiz, 1); die "Bad seek: $!" unless defined $pos; my $bytes = sysread($fh, $_, $bufsiz, 0); die "Bad read: $!" unless defined $bytes; $nlcount += tr/\n//; $pos = sysseek($fh, -$bufsiz, 1); die "Bad seek: $!" unless defined $pos; last if $pos == 0; } seek($fh, sysseek($fh, 0, 1), 0) || warn; <$fh> for $lines..$nlcount; $fh; }
Update: It does work when requesting more lines than the file contains, though I fixed it to work with various buffer sizes. I don't think a 20000% speed improvement (for tailing 400 10 byte lines) is 'too much' optimization :-) It does miscount if the last line does not have a line feed, though do you actually count that as a line or not? Besides, your's 'miscounts' in that situation also.

Replies are listed 'Best First'.
Re: Re: Last N lines from file (tail)
by clintp (Curate) on Dec 19, 2001 at 01:49 UTC
    Except that this doesn't work. Try reading the last N lines from a file with N-5 lines, or a file with < $bufsiz bytes. :)

    I had a version that used buffers and was a virtual clone of the algorithm in tail.c, except that I got lost and frustrated in the boundary conditions and really didn't care anymore. Laziness and impatience.

    If you want to take a stab at doing this right, be my guest. I just don't want to do the requisite testing, because the test conditions are yucky:

    • File of L lines reading:
      • L lines
      • L+l lines
      • L-l lines
      • 0 lines
    • Where bufsiz:
      • < size of the file
      • > size of the file
      • Some even multiple of the size of the file
      • Some even multiple of the size of the file less some portion of bufsz
      • == size of the file
    Basically all of the combinations of these. I got all but the last two coded with nice buffering action.

    After consideration, I figured I'd let the OS worry about buffering and JFDI. As a matter of fact, if you use getc()instead of sysread() (and seek instead of sysseek, etc..) the STDIO package would take care of most of this buffering nonsense anyway.

    sub lastn { my($file, $lines)=@_; my $fh; $lines++; if (! open($fh, $file) ) { print "Can't open $file: $!"; return; } binmode($fh); seek($fh, 0, 2); # Seek to end my $nlcount=0; while($nlcount<$lines) { last unless seek($fh, -1, 1); $_=getc($fh); die unless defined $_; $nlcount++ if ( $_ eq "\n"); last if $nlcount==$lines; last unless (seek($fh, -1, 1)); } $fh; }
    There is such as thing as too much optimizing. :)

    Update: with example.

      One character at a time is still slow, as my benchmarks below showed. This solution still benchmarked at about 4 cps, and my buffered solution gave about 1100 cps.

      Update: BTW, have you looked at File::Tail? It searches from the end of the file also, and if you don't want 'tail -f' behavior (i.e. a blocking read), then you can do:

      my $fh = File::Tail->new(name=>$filename,tail=>$lines); $fh->nowait(1); print $line while $line=$line->read;
      The performance is not horrible on large files, though a bit worse than my function, probably due to the overhead having all sorts of bells and whistles that are not being used.