in reply to Re: RFC: Abusing "virtual" memory
in thread RFC: Abusing "virtual" memory

But the real issue is: "Why use a disk-based hash store when you need to process the keys in sorted order?" (Do you need to process them in sorted order?)

For that DB_File provides a disk-based hash store with sorted keys - DB_BTREE.

update: added link

--shmem

_($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                              /\_¯/(q    /
----------------------------  \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Replies are listed 'Best First'.
Re^3: RFC: Abusing "virtual" memory
by jbert (Priest) on Nov 27, 2007 at 17:47 UTC
    Ah, cool, thanks.

    In which case the question is more simple: "why sort your keys in the app when you can get the db to do it for you?" :-)

      “Why sort your keys externally when you can get the DB to do it for you?”

      Indeed...

      To seek to answer that question, I give you today's workload of 11,344,209 telephone call records. You have exactly 4 hours wall-time to process them. If you imagine that you have enough time to put all those records into a B-tree-indexed random file, I have a bridge to sell you. Instead, to solve this problem and to do so consistently on a daily basis, it will be necessary for you to accomplish the same workload -- with utter reliability and consistency -- much faster.

      It may come as an utter and complete shock to you to fathom that your grandfathers, armed with nothing more punched-card tabulators and sorters, with nary a digital computer in sight, could do that. Every day. Under wartime conditions.

        #!/usr/bin/perl use strict; use warnings; use BerkeleyDB; my $db_file = '/home/snowhare/perl_monks/example.db'; unlink $db_file; my $db = BerkeleyDB::Btree->new( -Filename => $db_file, -Cachesize => 700_000_000, -Flags => DB_CREATE, ); srand; for(my $count = 0; $count < 12_000_000; $count++) { my $random_value = rand(16776216); my $status = $db->db_put( "$random_value" => "$count" ); } undef $db; [snowhare@blue-bay perl_monks]$ time ./big_btree.pl real 3m47.121s user 3m7.978s sys 0m4.884s
        This was on a desktop class machine with 1.5GBytes of RAM and an AMD Athlon 64 3000+ processor running Fedora Core 6 Linux.