MelaOS has asked for the wisdom of the Perl Monks concerning the following question:

I have a script which will reverse the order of the content of a file by using two arrays. The first to store and reverse and the second to keep the reversed content. The problem with this approach is that it might break when the file size gets too big. It might be over 100-200MB in file size.

I've checked online and found a way to print it in reverse order. But i need to process the file line by line in reverse order. Any idea how i can get this done? thanks!!

Replies are listed 'Best First'.
Re: How to reverse a huge file in Perl?
by BrowserUk (Patriarch) on Mar 26, 2008 at 07:34 UTC
Re: How to reverse a huge file in Perl?
by andreas1234567 (Vicar) on Mar 26, 2008 at 08:20 UTC
    You can use Tac (Unix) if you are on a Unix like operating system. It's much faster than using Perl and File::ReadBackwards:
    # ls -lh /var/lib/mysql/ibdata1 -rw-rw---- 1 mysql mysql 66M Mar 19 11:05 /var/lib/mysql/ibdata1 # wc -l /var/lib/mysql/ibdata1 323833 /var/lib/mysql/ibdata1 # time tac /var/lib/mysql/ibdata1 > /tmp/bar real 0m0.643s user 0m0.173s sys 0m0.322s # time /usr/bin/perl > /tmp/foo use warnings; use strict; use File::ReadBackwards; my $bw = File::ReadBackwards->new( '/var/lib/mysql/ibdata1' ) or die "$!" ; while( defined( my $log_line = $bw->readline ) ) { print $log_line ; } __END__ real 0m29.076s user 0m11.910s sys 0m10.819s # md5sum /tmp/foo 2b31d9f47525853842d5dbce584bd95c /tmp/foo # md5sum /tmp/bar 2b31d9f47525853842d5dbce584bd95c /tmp/bar
    Update Wed Mar 26 10:05:03 CET 2008: Added number of lines in input file.

    Update Wed Mar 26 10:11:31 CET 2008: It seems that the tac + process file forwards using Perl approach combined is still much faster than process file using Perl and File::ReadBackwards:

    # time /usr/bin/perl > /tmp/foo use warnings; use strict; use File::ReadBackwards; my $bw = File::ReadBackwards->new( '/var/lib/mysql/ibdata1' ) or die "$!" ; while( defined( my $log_line = $bw->readline ) ) { $log_line =~ s/1/2/g; print $log_line; } __END__ real 0m27.431s user 0m12.906s sys 0m10.701s [root@afflinux aff]# time /usr/bin/perl > /tmp/bar use warnings; use strict; my $FH = undef; open($FH, '/var/lib/mysql/ibdata1' ) or die "$!" ; while( my $log_line = <$FH> ) { $log_line =~ s/1/2/g; print $log_line; } __END__ real 0m5.249s user 0m1.535s sys 0m0.293s
    --
    Andreas
      It's much faster than using Perl and File::ReadBackwards:

      True if the only aim is to reverse the file.

      If however, the aim is to process the data in the file in the reverse order using perl, you'd have to compare F::RB with using tac and then reading every record in the file using Perl.

      Also, how many lines are there in your test file?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      Wow. Might be interesting to take a peak at the tac source code to try to figure out what it's doing that's so much better than File::ReadBackwards. There must be a trick here, that's way more than the standard C-versus-Perl difference.

      -sam

      tac is also available for win32 e.g.
      C:\mycode>perl -le "print $_ for qw[spam bacon eggs]" > things.txt C:\mycode>cat things.txt spam bacon eggs C:\mycode>tac things.txt eggs bacon spam
Re: How to reverse a huge file in Perl?
by zentara (Cardinal) on Mar 26, 2008 at 12:38 UTC
    See how tac thru a piped open works speedwise for you.
    #!/usr/bin/perl use warnings; use strict; my $file = shift || $0; my $pid = open(FH,"tac $file |") or die "$!\n"; my $count = 0; while (<FH>){ #do your line processing here print $count++ .' '. $_; }

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum
Re: How to reverse a huge file in Perl?
by ShayShay (Acolyte) on Mar 26, 2008 at 14:30 UTC
        That's completely untested, right? Secondly, even if you did fix the most obvious mistakes, it does not do what you want it to (reverse each line vs. print file reversed).

        Thirdly, if one was to use a load-the-whole-file-into-an-array approach it most likely would be very slow for large files.

        --
        Andreas