in reply to Howto Avoid Memory Problem in List::MoreUtils

Just curious... would you get the "Out of memory!" result if you simply did this instead of the "uniq" call (other things being equal up to this point):
my @uvery_big_array = @very_big_array;
In any case, since one way or another you'll need to sacrifice some runtime to economize on memory usage, a last resort might be to simply save @very_big_array to a disk file, run unix "sort -u" on that, and read it back in to the same array variable. (Obviously not an attractive option if portability is an issue, but if this is just an "in-shop" process, unix "sort" has been ported to every popular OS.)

Replies are listed 'Best First'.
Re^2: Howto Avoid Memory Problem in List::MoreUtils
by salva (Canon) on May 05, 2006 at 09:06 UTC
    Sort::External could be a good option for that also.

    Or, comming back to the hash aproach used by List::MoreUtils::uniq, using an in disk tree as provided by DB_File could be a faster solution specially if the number of duplicates is high.

Re^2: Howto Avoid Memory Problem in List::MoreUtils
by Anonymous Monk on May 26, 2006 at 02:55 UTC
    Hi graff,
    Sorry for coming back to you again. I guess I have to resort to your way of doing it. As for your suggestion below:
    save @very_big_array to a disk file
    How could one implement it efficiently directly from the existing non-uniq array (as @very_big_array)?
      use DB_File; tie my %uniq, "DB_File", "/tmp/uniq"; # you should use File::Temp here +! $uniq{$_} = 1 for @data; my @uniq = keys %uniq;
        Thanks a lot salva, I have read your latest posting with File::Temp.