in reply to How to remove duplicates from a large set of keys

I'd create a lookup table:
#pseudo: my %looupTable; if (!$lookupTable{$yourkeyname}) { $lookupTable{$yourkeyname} = 1; }

A million keys should run fine on a decent machine.

Replies are listed 'Best First'.
Re^2: How to remove duplicates from a large set of keys
by nite_man (Deacon) on Feb 10, 2005 at 08:46 UTC

    Unfortunately, lookup in the hash with more than million keys takes a lot of time. I suspect that there is a way to do it more efficient.

    ---
    Michael Stepanov aka nite_man

    It's only my opinion and it doesn't have pretensions of absoluteness!

      Lookup in a hash of 1 million keys is rougly the same as lookup in a hash of 10 keys. :-) (Assuming you are still inside of physical memory.)

      Its creating the hash thats the problem. It takes a long time, especially if you dont know how many records you are storing up front.

      ---
      demerphq

        Lookup in a hash of 1 million keys is rougly the same as lookup in a hash of 10 keys
        Can you explain, please

        ---
        Michael Stepanov aka nite_man

        It's only my opinion and it doesn't have pretensions of absoluteness!

      Is it?
      This script took 6 seconds to complete (run), with 2 million keys, on a Celeron 2.4 Ghz, 512 Mb, running X, Mozilla, Apache, john, etc. etc. and took at max. 27 % of the memory
      #!/usr/bin/perl -w use strict; my %lookupTable; for (my $i=0; $i<=2000000; $i++) { if (!$lookupTable{$i}) { $lookupTable{$i} = 1; } }

        I was curious so I took the same code, added a timer, and ran it on win32. It took 10 seconds. However, I'm running a 1.8 Pentium M, 1G of memory, WinXP, ActivePerl 5.8.6.

        I suspect that nite_man does not have as beefy of a box as you do. It might be acceptable if the look up doesn't occur frequently.

        In retrospect, it wasn't a good test since the system specs weren't the same and I'm not fimilar with the whole problem. But it was a good 30 second distraction. :)

        For the curious:

        #!perl -w use strict; my $startTime = time(); my %lookupTable; for (my $i=0; $i<=2000000; $i++) { if (!$lookupTable{$i}) { $lookupTable{$i} = 1; } } my $totalTime = time() - $startTime; print "Total Time: $totalTime\n";