traxlog has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
It occurred to me that all that effort by DB_File to keep things in order in a BTREE db - constantly changing records, lengths of records and re-ordering - must take it's toll on the db file.

Does it need defragmenting or is that already taken care of for us?

Is it safe just to let it do it's thing for years to come, or is there some kind of maintenance necessary?

Many thanks,
traxlog

Replies are listed 'Best First'.
Re: Defragmentation of a BTREE DB_File
by perrin (Chancellor) on Dec 30, 2003 at 17:00 UTC
    Typically you will not lose speed but you will waste space after a long use of a single database with many deletions and changes. The way to fix it is to copy everything to a new database file record-by-record, and then replace the old with the new. I've seen a couple of utilities for this that you might be able to find if you Google for them.
Re: Defragmentation of a BTREE DB_File
by TomDLux (Vicar) on Dec 30, 2003 at 18:41 UTC

    In a database, you don't really delete a record, you simply mark it deleted. Depending on the system you are using, that disk space might then be re-used for a new record, or it might sit, wasting space. In that sense, you might need to re-pack a database on occassion.

    Convenional defragmentation, on the other hand, is not only a filesystem concern, but a Microsoft concern. It does not apply to Unix file systems. That's why you have to de-frag MS file systems, but there's no such utility for Unix.

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

      Well, really, it's a concern for FAT filesystems, not even Microsoft filesystems in general. NTFS does tend more to fragmentation (and generally behaves a bit worse in most aspects) than the Unix filesystems it resembles, but it is hardly an issue even then.

      Makeshifts last the longest.

Re: Defragmentation of a BTREE DB_File
by oha (Friar) on Dec 30, 2003 at 16:24 UTC
    there are several kinds of tree: red/black, ordered, fibonacci, heap

    i think if you use an external library (gdbm or what other) this is not in you power, and this does not need your attention. maybe, if you notice performance falling after use, you can consider to use a different kind of implementation.

    defragmentation, on the other hand, is something related to take fragments of a "file" around a storage and try to rejoin together in the natural order: you can see it's not related to btree but with fs and it's, i think, off the topic of your question.

    (i hope my english is at least understandable)