in reply to Out of memory problem when copying contents of file into array

Here's the benchmarks on a 8.6M, 156896 line file
                      Rate TedYoung   OP's RandomWalk `tail -100` IO::All->backwards File::ReadBackwards
TedYoung            1.37/s       --   -14%       -39%        -99%               -99%               -100%
OP's                1.60/s      17%     --       -29%        -99%               -99%               -100%
RandomWalk          2.26/s      64%    41%         --        -99%               -99%               -100%
`tail -100`          157/s   11309%  9679%      6841%          --               -37%                -82%
IO::All->backwards   251/s   18145% 15538%     10999%         60%                 --                -71%
File::ReadBackwards  868/s   63105% 54075%     38349%        454%               246%                  --
Update: It's interesting to note that on a much smaller file (47Kb in my case), tail occassionally wins over File::ReadBackwards and IO::All but those three consistently outperform the others.

Update (again): Didn't look at the OP's code well enough when I made the benchmark. Fixed it so that it works comparitively to the other examples. Also removed the split() from the `tail -100` which (much to my surprise) isn't necessary. Neither of these things appear to have affected the speed comparisons in any measurable way...

The code I used to do the benchmark:
use strict; use Benchmark qw(cmpthese); use lib '/home/mason/devel/lib'; use IO::All; use File::ReadBackwards; my $f = "/usr/local/apache/logs/error_log"; cmpthese( -3, { 'OP\'s' => sub { my $lineno = 100; open( FILE, "$f" ) or die "Can't find $f\n"; my @lines = <FILE>; my $num = @lines; my @last; for ( ; $lineno > 0 ; $lineno-- ) { push(@last,$lines[ $num - $lineno ]); } close(FILE); }, 'IO::All->backwards' => sub { my $f = io($f)->backwards; my @lines = reverse map { $f->getline } 1 .. 100; }, 'File::ReadBackwards' => sub { my $f = File::ReadBackwards->new($f); my @lines = reverse map { $f->readline } 1 .. 100; }, 'TedYoung' => sub { my @lines; my $lines = 100; open F, $f or die $!; my @lines; while (<F>) { push @lines, $_; shift @lines if @lines > $lines; } close F; }, '`tail -100`' => sub { my @lines = `tail -100 $f`; }, 'RandomWalk' => sub { my ($file, $lines) = ($f,100); open F, $file or die $!; my @lines; $#lines=($lines-1); my $i; while (<F>) { $lines[$i++] = $_; $i = 0 if $i == $lines; } close F; return @lines[$i--..$#lines, 0..$i] } } );
  • Comment on Re: Out of memory problem when copying contents of file into array (Benchmarks)
  • Download Code

Replies are listed 'Best First'.
Re^2: Out of memory problem when copying contents of file into array (Benchmarks)
by Anonymous Monk on Feb 18, 2005 at 17:08 UTC
    'OP\'s' => sub { my $lineno = 100; open( FILE, "$f" ) or die "Can't find $f\n"; my @lines = <FILE>; my $num = @lines; for ( ; $lineno > 0 ; $lineno-- ) { my @tail = @lines[ $num - $lineno ]; } close(FILE); },
    This can't be correct. You never create an array with 100 lines. You do create 100 arrays with one line each, using a single element slice ('use warnings' complains).
    '`tail -100`' => sub { my @lines = split( /\n/, `tail -100 $f` ); },
    Why the explicite splitting? Why not just
    my @lines = `tail -100 $f`;
    Not that it makes a huge difference speedwise.
      I didn't write it... I just copied it from the OP's post. On even a cursory look, it's obviously broken but I didn't do that when I was setting up the Benchmark. My mistake.