in reply to Sorting a (very) large file

You probably do have enough memory to sort the file, but you need to be careful about how you handle it. Doing a Schwartzian transform on it like that is creating huge temporary copies, which is probably what's blowing through your memory. You might try something slower that uses less memory:

@lines = <INPUTFILE>; @lines = sort { (split($a, /\t/))[2] <=> (split($b, /\t/))[2] } @li +nes;

I believe the most recent Perls have a special in-place sort() optimization when you're sorting an array and assigning it back to itself. (UPDATE: Yup, it was added in v5.8.4)

-sam

Replies are listed 'Best First'.
Re^2: Sorting a (very) large file (better*2)
by tye (Sage) on Nov 30, 2007 at 20:05 UTC

    Or you could try something twice as fast that uses much less memory. In this case I'd likely sort parallel arrays (though it isn't too complicated to do something even faster that uses even less memory, such as fast, flexible, stable sort).

    my @size= map { ( split /\t/, $_ )[2] } @in; my @idx= sort { $size[$a] <=> $size[$b] }, 0..$#in; @in= @in[@idx];

    And I'd check how much paging space is configured. It sounds like that could be increased.

    - tye        

      I really doubt that's going to work in 1GB of RAM on a 400MB input (and don't worry about paging space - if you start paging then any speedup you gained just went bye-bye). I guess it's possible, but this line looks pretty bad to me:

         @in = @in[@idx];

      Or do you happen to know that Perl does that in-place on @in? I guess the only way to really know would be to try it, but unfortunately I do have some real work to do...

      -sam

        I believe there is this thing called "virtual memory"... Iterating over a list once doesn't have much "thrash" potential, unlike sorting.

        Update: Also note that the tons of anonymous arrays of the ST add a lot more memory than most people might think. I suspect that would be about as much or even more memory than required by the copies of each line.

        - tye