I just came up with this solution on the fly, and haven't ever had to do this before in Perl before now. It could probably be optimized a bit, and might not handle some cosmically weird special case, like maybe a boundary falling right on a newline, or seeking past the beginning of the file (shouldn't be a problem with a huge file, but would be good to handle all the same), but it works on some decent base cases I tried out. I picked an arbitrary block size, but you could override it...

sub print_trailing_lines { my $file = shift(); # file to read my $num_lines = shift(); # number of lines to grab my $block_size = shift() || 1024; # size of blocks to slurp my @lines = (); #array of lines open(FILE, "<$file") or die "could not open $file for reading: $!"; seek(FILE, -$block_size, 2); # go to the end, minus a block while ($num_lines) { # while we've got more lines to grab... my $block = undef; read(FILE, $block, $block_size, 0); # suck in a block seek(FILE, -2 * $block_size, 1); # back up in file by two blocks my @chunks = split /\n/, $block; # split block into lines # cat last line from this block with first line of previous block push(@lines, pop(@chunks) . shift(@lines)) if @lines; # deal with fact that current block # might have more lines than we want shift(@chunks) while(@chunks > $num_lines); # subsume this block's lines unshift(@lines, @chunks); # make note of how many lines we grabbed $num_lines -= scalar(@chunks); } close(FILE); print join("\n", @lines), "\n"; }

Probably not a perfect solution, but it should give you the basis for how to solve the problem. I think both this and the above mentioned "round robin" solution have their advantages. With this solution, you need to know roughly the size of lines if you want a sane block size, but you can override that as necessary. The real advantage is you don't have the horribly wasteful expenditure of reading through the whole file. The "round robin" solution, I think, will only cut your wait time in half. My solution's execution time, however, should not vary with file size.


In reply to Re: breaking up very large text files - windows by skyknight
in thread breaking up very large text files - windows by antichef

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.