comment on

I just came up with this solution on the fly, and haven't ever had to do this before in Perl before now. It could probably be optimized a bit, and might not handle some cosmically weird special case, like maybe a boundary falling right on a newline, or seeking past the beginning of the file (shouldn't be a problem with a huge file, but would be good to handle all the same), but it works on some decent base cases I tried out. I picked an arbitrary block size, but you could override it...

sub print_trailing_lines {
    my $file       = shift(); # file to read
    my $num_lines  = shift(); # number of lines to grab
    my $block_size = shift() || 1024; # size of blocks to slurp

    my @lines = (); #array of lines

    open(FILE, "<$file")
         or die "could not open $file for reading: $!";

    seek(FILE, -$block_size, 2); # go to the end, minus a block

    while ($num_lines) { # while we've got more lines to grab...
    my $block = undef;
    read(FILE, $block, $block_size, 0); # suck in a block
    seek(FILE, -2 * $block_size, 1); # back up in file by two blocks

    my @chunks = split /\n/, $block; # split block into lines

    # cat last line from this block with first line of previous block
    push(@lines, pop(@chunks) . shift(@lines)) if @lines;

    # deal with fact that current block
        # might have more lines than we want
    shift(@chunks) while(@chunks > $num_lines);

    # subsume this block's lines
    unshift(@lines, @chunks);

    # make note of how many lines we grabbed
    $num_lines -= scalar(@chunks);
    }

    close(FILE);
    print join("\n", @lines), "\n";
}
[download]

Probably not a perfect solution, but it should give you the basis for how to solve the problem. I think both this and the above mentioned "round robin" solution have their advantages. With this solution, you need to know roughly the size of lines if you want a sane block size, but you can override that as necessary. The real advantage is you don't have the horribly wasteful expenditure of reading through the whole file. The "round robin" solution, I think, will only cut your wait time in half. My solution's execution time, however, should not vary with file size.

In reply to Re: breaking up very large text files - windows by skyknight
in thread breaking up very large text files - windows by antichef

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.