antichef has asked for the wisdom of the Perl Monks concerning the following question:

I'm working with an application that has recently started generating very large log files (probably due to some yucky unchecked recursion? in one or more places in the code - it's not Perl :) ). The app is running in a windows environment, and the files get so big (>3GB) that the various windows ports of 'tail' that we use refuse to open them -- eg,
tail: bigfile.log: Invalid argument Assertion failed: valid_file_spec (f), file tail.c, line 700

So I stepped in and slogged out some Perl to get the last 1000 lines of the file:

open (FILE,"<bigfile.log"); while ($line = <FILE>) { $count++; } $numlines = $count; $count = 0; close FILE; open (FILE,"<bigfile.log"); open (SMALLERFILE,">smallerfile.log"); while ($line = <FILE>) { $count++; if ($count > $numlines - 1000) { print SMALLERFILE $line; } } close FILE; close SMALLERFILE;
It works (it takes forever), but there's got to be a better way... :) I looked through the docs haven't had much luck. Any suggestions?

Replies are listed 'Best First'.
Re: breaking up very large text files - windows
by dga (Hermit) on Jul 28, 2003 at 20:32 UTC

    One possibility is to seek to the end of the file and work back from there.

    If you don't need exactly 1000 lines another possibility is to seek to some distance from the end say 1k bytes, read and ignore one line (probably part of a line) then output the rest of the file to your smallfile. Of course if the log messages are a fixed length then the math to get 1000 lines from the end is easy to do. Though since you want to tail the files then some number of kilobytes then reading forward to the next linebreak to get the start of a line should probably work and quickly to boot.

    open(BIG, "name_here"); open(SMALL, "other_name_here"); seek(BIG, -1024, 2); my $junk=<BIG>; #pitch to end of current line while(<BIG>) { print SMALL $_; }
Re: breaking up very large text files - windows
by fglock (Vicar) on Jul 28, 2003 at 20:44 UTC
Re: breaking up very large text files - windows
by CountZero (Bishop) on Jul 28, 2003 at 20:55 UTC

    Another way of doing it is going through your big file and filling an array of (say) 1000 lines (or as many as you would like to keep) replacing the oldest lines by newer lines in a round-robin-type of fashion. At the end of the big file you have your 1000 lines in the array.

    If you don't know beforehand how long the lines are, I think it is the best and fastest way of doing it (you don't have to read in your big file twice as you did), with the exception perhaps of the modules already suggested.

    Update: As a matter of fact this is the way the Perl Power Tools tail-function works also (but it has a lot more options, which you perhaps do not need) and hereafter follows the relevant code from the Power Tools (slightly adapted):

    while(<$fh>) { $i++; $buf[$i%($p)] = $_; } my @tail = (@buf[ ($i%($p) + 1) .. $#buf ], @buf[ 0 .. $i%($p)]); for (@tail) { print if $_; }
    $fh is the filehandle to your big file and $p is the number of lines you need

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: breaking up very large text files - windows
by skyknight (Hermit) on Jul 28, 2003 at 21:36 UTC

    I just came up with this solution on the fly, and haven't ever had to do this before in Perl before now. It could probably be optimized a bit, and might not handle some cosmically weird special case, like maybe a boundary falling right on a newline, or seeking past the beginning of the file (shouldn't be a problem with a huge file, but would be good to handle all the same), but it works on some decent base cases I tried out. I picked an arbitrary block size, but you could override it...

    sub print_trailing_lines { my $file = shift(); # file to read my $num_lines = shift(); # number of lines to grab my $block_size = shift() || 1024; # size of blocks to slurp my @lines = (); #array of lines open(FILE, "<$file") or die "could not open $file for reading: $!"; seek(FILE, -$block_size, 2); # go to the end, minus a block while ($num_lines) { # while we've got more lines to grab... my $block = undef; read(FILE, $block, $block_size, 0); # suck in a block seek(FILE, -2 * $block_size, 1); # back up in file by two blocks my @chunks = split /\n/, $block; # split block into lines # cat last line from this block with first line of previous block push(@lines, pop(@chunks) . shift(@lines)) if @lines; # deal with fact that current block # might have more lines than we want shift(@chunks) while(@chunks > $num_lines); # subsume this block's lines unshift(@lines, @chunks); # make note of how many lines we grabbed $num_lines -= scalar(@chunks); } close(FILE); print join("\n", @lines), "\n"; }

    Probably not a perfect solution, but it should give you the basis for how to solve the problem. I think both this and the above mentioned "round robin" solution have their advantages. With this solution, you need to know roughly the size of lines if you want a sane block size, but you can override that as necessary. The real advantage is you don't have the horribly wasteful expenditure of reading through the whole file. The "round robin" solution, I think, will only cut your wait time in half. My solution's execution time, however, should not vary with file size.

Re: breaking up very large text files - windows
by Cody Pendant (Prior) on Jul 28, 2003 at 23:41 UTC
    Just one other suggestion that nobody's come up with -- is it possible to use some kind of Tie module and treat the lines of the file as an array?

    ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print

      Good idea, but if the big file is really BIG, perhaps one runs into memory problems (I guess it will depend on how the tie is done) and somehow one still has to run through the file to see where the individual lines start, so you can tie your array to them.

      In the same vien I was thinking of some DBI::DBD solution (there are DBD-drivers for flat-file database-files) but it also needs to work through the file to find the individual records, unless you have fixed length records and then the matter is trivial to solve in any case.

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: breaking up very large text files - windows
by derby (Abbot) on Jul 28, 2003 at 20:31 UTC
    The first step in having a usable windows box is too install cygwin.

    -derby