in reply to Memory Restrictions

In addition to the other monks' advice, please take into consideration that hashes are not designed to be "conservative" with regard to memory consumption. They are designed to be fast. Perhaps the exception to this, is a relatively new optimization where the text for common hash keys is stored once IIRC.

A simple algorythm would be something along the lines of this untested, pseudo-code example:

my @uniques = (); my $md5; while (my $string = <FILE>) { $md5 = md5hex $string; if (grep { $md5 eq $_ } @uniques) { warn "$string is not unique\n"; # or push() into another list... } else { push @uniques, $md5; } } # Now @unique contains the list of unique strings
This should use less memory than what I imagine your solution to be. Note that showing us some of your code can help us give better answers.

Update: Ok, ok. I added the obligatory MD5 :).

Regards.

Replies are listed 'Best First'.
Re: Re: Memory Restrictions
by derby (Abbot) on Oct 24, 2002 at 12:46 UTC
    please take into consideration that hashes are not designed to be "conservative" with regard to memory consumption

    quite true (and ++). perl will "over allocate" memory on the assumption you're always going to need more. If you know ahead of time (or can calculate at run-time), you can prevent the over allocation by preallocating memory (checkout perldata):

    my @array; $#array = 512; # or my %hash; keys %hash = 512;

    but I don't think this is an issue with the original post.

    -derby