skazat has asked for the wisdom of the Perl Monks concerning the following question:

hello all you moonlight perl people and friends from a across the old pond,

seems I have a problem with tied hashes, used a package such as the DB_File, or the ODBM_FILE, and the like, when I try to delete an entry in a tied hash, like so:

delete($TIED_HASH{$key});

Two things may happen. The key of the hash goes away, thus I don't see it, and the entry appears to be lost. But, if I look at a DB file, one that I've just deleted all entries from, the file itself is about 3 megs large and a quick 'strings' command from the terminal shows all the old information still lying around. Its almost like the key was unlinked(), but the value wasn't erased. What's perl thinking? That if I run out of disk space, then it'll start to shave the db file? That doens't seem likely.

The other problem is that the key won't be deleted at all, even though I gave the code above a run through the entrpreter.

I'm using the AnyDBM_File package, since this is a program I release unto the world, but I know senario #! is happening with the DB_File.

My thought on this is that I'm doing something wrong. Do I have to undef() the value, and then delete() the key? This makes sense, as undef() would get rid of the value, and delete() would get rid of the key. I've been wandering through the docs on tie() delete() DB_File(), etc and haven't seen any special instructions.

I'm getting scenario #1 on a FreeBSD box, running suEXEC and loading the DB_File package.

I remember I used to copy the hash to memory, delete the key, and then copy that back to the tied hash, but this isn't something I want to do with something that could be 3 megs!:

tie %ARCHIVE, "AnyDBM_File", "$db_file", O_RDWR|O_CREAT, $file_chmod +or $warning = 1; my %copy = %ARCHIVE; untie %ARCHIVE; delete($copy{$key}); tie %ARCHIVE, "AnyDBM_File", "$db_file", O_RDWR|O_CREAT, $file_chmod +or $warning = 1; %ARCHIVE = %copy; untie %ARCHIVE;

What is going awry?

 

-justin simoni
!skazat!

Replies are listed 'Best First'.
Re: tied hashed and deleting keys and their valeus
by jeroenes (Priest) on Jan 31, 2001 at 15:04 UTC
    Your problem with the key being in the DB file. It's in the docs of BerkeleyDB (v3):
    Space freed by deleting key/data pairs from a Btree or Hash database is never returned to the filesystem, although it is reused where possible. This means that the Btree and Hash databases are grow-only. If enough keys are deleted from a database that shrinking the underlying file is desirable, you should create a new database and insert the records from the old one into it.
    So chances are that DB doesn't blank the items, but just removes the references.

    If deleting the hash doesn't work, did you try the object interface? You can do $db = tie %hash .... and later $db->db_del or something similar.

    Hope this helps,

    Jeroen
    "We are not alone"(FZ)

      This means that the Btree and Hash databases are grow-only

      wow, that's. crazy. what a bad system! I'll try the object interface... does that work with the AnyDB_File as well?

      what possesed someone to make a db package like that? I guess its a tradeoff for being able to hold large amounts of info in the first place (grumble grumble)

       

      -justin simoni
      !skazat!

        Well, you can imagine that whenever you have a 2Gigabyte file, it takes really a lot of time to shift the whole thing let's say 10 bytes. If the database is occupied with that every delete for about 15 minutes, the user isn't happy.

        If you really want to save space, the copy isn't a really bad option.

        Jeroen
        "We are not alone"(FZ)

        This is a relatively known and generally widely accepted drawback to using DB files. As another poster mentions, the overhead involved in re-building the DB file after the deletion of a row within it would make using the DB file horribly slow and expensive. Space is re-used where it can. In other words if you come back later and insert a new row, it will try to re-use some of the space already allocated (but 'deleted'), but it's not going to make an effort to shift everything else around to accomodate. This is precisely the same problem that leads to filesystem fragmentation.

        If you want efficient data storage, consider using a real database instead.

        Well, the speed vs. space trade off is the most common optimization there is. In this case it is a perfectly fine tradeoff that they made. How often do you think it is that DBs shrink significantly?

        --
        $you = new YOU;
        honk() if $you->love(perl)