camelcom has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks - could someone please help? what's the fastest way of reversing the order of a flat file containing > 1million lines? (I have a sybase bcp file which I would like to load in reverse order)

i.e. I would like to transform a file with lines 1..1000000 to 1000000..1

thanks!

Replies are listed 'Best First'.
Re: Reversing A File
by marto (Cardinal) on Dec 12, 2007 at 12:59 UTC
Re: Reversing A File
by andreas1234567 (Vicar) on Dec 12, 2007 at 14:37 UTC
    You can use tac if you are on a Unix-like system:
    $ echo "one > two > three" > foo $ cat foo one two three $ tac foo three two one
    --
    Andreas
      If you don't have tac on your system (which some of our AIX boxes don't), you can also use this short script,
      #!/bin/sh file=$1; i=1 max=`wc -l $file | cut -d' ' -f1,1 ` while [ $i -le $max ]; do tail -$i $file | head -1 i=$((i+1)) done
      Call it much like tac,
      $ ./reverse your.file > reverse.file
      It's not as fast as tac obviously, but if you don't have 'tac' and want to reverse a file it does work.

      ---
      s;;:<).>|\;\;_>?\\^0<|=!]=,|{\$/.'>|<?.|/"&?=#!>%\$|#/\$%{};;y;,'} -/:-@[-`{-};,'}`-{/" -;;s;;$_;see;
      Warning: Any code posted by tuxz0r is untested, unless otherwise stated, and is used at your own risk.

Re: Reversing A File
by Joost (Canon) on Dec 12, 2007 at 13:49 UTC
      As it's pure perl, I installed File::ReadBackwards locally and it's definitely the way to go - very impressive - only 90seconds to reverse the 180MB file and uses less than 5MB of memory to do it.

      I diff'd the original .txt file with the .rev.rev file and no differences, so it seems to work ok!

      problem solved - once again - many many thanks!

Re: Reversing A File
by citromatik (Curate) on Dec 12, 2007 at 13:02 UTC

    Probably it is not the fastest way, but Tie::File should work

    use Tie::File; my $file = shift @ARGV; tie my @data, 'Tie::File', $file or die $!; tie my @reversed, 'Tie::File', $file."rev" or die $!; @reversed = reverse(@data); untie @data; untie @reversed;

    Tie::File doesn't load the file into memory, it indexes it, so this should work even for very big files

    citromatik

      Your reverse puts all 1 million lines in memory.

      That's on top of the 2 million indexes you place in memory when only 1 million are needed.

      Fix:

      use Tie::File; tie(my @data, 'Tie::File', $fname_in) or die("Unable to open \"$fname_in\": $!\n"); open(my $fh_out, '>', $fname_out) or die("Unable to create \"$fname_out\": $!\n"); for (my $i=@data; $i--; ) { print $fh_out $data[$i]; } untie @data;

      Although I recommend File::ReadBackwards.

      use File::ReadBackwards qw( ); my $fh_in = File::ReadBackwards->new($fname_in) or die("Unable to open \"$fname_in\": $!\n"); open(my $fh_out, '>', $fname_out) or die("Unable to create \"$fname_out\": $!\n"); while (defined(my $line = $fh_in->readline())) { print $fh_out $line; }
        Unfortunately, this firm doesn't have File::ReadBackwards and it's a real pain getting approval for non-core modules!

        I tried the Tie::File idea (thanks for that), but killed the job after 10mins of max'd out cpu (it was about half way through) - the file is only 180MB

        There must be a faster way??