in reply to Re^2: In-place sort with order assignment
in thread In-place sort with order assignment

Can you extract 10 million keys from a hash (without blowing memory); write them to a file; start an external process (without forcing the current process to be swapped out), that will read them from that file and write them to other files several times; before reading them back into memory from the file; all in 108 seconds?
On my computer? No, I wouldn't even be able to read in 10 million keys without blowing memory.

On your computer? I've no idea - I don't have access. You can try it out (after writing the code, you'll have your answer within 2 minutes). Or you can just dismiss out of hand, without knowing for sure. Your call - it's your problem after all.

  • Comment on Re^3: In-place sort with order assignment

Replies are listed 'Best First'.
Re^4: In-place sort with order assignment
by BrowserUk (Patriarch) on Sep 20, 2010 at 14:32 UTC
    On your computer?

    No!

    #! perl -slw use strict; our $N //= 10e6; my %hash; my $val = 'AAAAA'; undef $hash{ $val++ } for 1 .. $N; my $start = time; open SORT, '| sort > sorted' or die $!; print SORT $_ while $_ = each %hash; close SORT; undef %hash; open IN, '<', 'sorted' or die $!; chomp(), $hash{ $_ } = $. while <IN>; close IN; my $elapsed = time - $start; printf "Took %.6f seconds for $N items (%.5f per second)\n", $elapsed, $elapsed / $N; __END__ C:\test>junk41 -N=10e6 Took 225.000000 seconds for 10e6 items (0.00002 per second)

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Not that this seems to help you very much but sort was my first instinct too. My rather dated Debian Squeeze box is an AMD 3800+ (64 bit dual core) with 2 gigs RAM and one 5 year old IDE drive can almost do it.

      $ perl 860849.pl Took 131.000000 seconds for 10000000 items (0.00001 per second)

      I think I could reach that 108 by putting the sorted file on a separate spindle with a thinner filesystem layer (e.g. no ext3 and LVM2).

      I'm really pointing this out because it's quite attainable with a reasonably recent hard drive. Maybe even just SATA/PATA would do the trick. SSD surely would.

        My next intended purchase is a cheap (sub £50), small capacity (32 or 64GB) SSD. {This is the closest Ive seen so far.)

        An external sort would definitely become a viable option if I could use that for the temporary files.