in reply to Key to use for files in db!

The problem is that searching large numbers of files becomes very inefficient if you have to search a large string or strings. The best way to do this I think is to create a short hash - no more than 3 hex characters - from the full path, then index that and use it as the first part of your search. You might possibly get 2 or 3 files that have the same 3-character key, but these will require very little processing time for checking the full path.
SELECT FROM mytable WHERE hashkey = '4ea' && path = '/my/path/to/file/ +isthis.dat';
With hashkey indexed.
use Digest::MD5 qw(md5 md5_hex md5_base64); my $path = '/my/path/to/file/isthis.dat'; my $hashkey = substr(md5_hex($path), 0, 3);
EDIT: I should also note here that variable-width fields can't be indexed in their entirety. You have to define a width for your key, which means either leaving out part of some variable-width fields or padding most of them (which defeats the whole purpose of a variable-width field). It's much better to use a CHAR(3) key (see above).

Replies are listed 'Best First'.
Re^2: Key to use for files in db!
by Ace128 (Hermit) on Dec 30, 2005 at 14:48 UTC
    Alrigthty. I actually had something like this in mind, but put black on white like this makes it seem even better. Thanks! As it is now I got just that $path as VARCHAR(255) and primary key. With your little addon, I suppose that using the $hashkey aswell as CHAR(3), will make the lookup really fast, since the calculation if the hash is faster (?)...
Re^2: Key to use for files in db!
by pajout (Curate) on Jan 02, 2006 at 13:05 UTC
    Imho, your idea is good, but, why code it into perl when database engine is able to support it technically? If you use all features of the specific implementation of indexes, you can force the same effect and your sql queries will stay clear...
    I know that this theme is off topic and I don't want flame, I only advice to observe and to use features of sql engine before modifying perl code.
      Em, how do you mean?
        I see following solutions in PostgreSQL:

        1/ supposing btree unique index on path column
        SELECT * FROM mytable WHERE path = '/my/path/to/file/isthis.dat';
        I guess that it could be OK when database finely tuned. But optimizer could decide that index is not selective.

        2/ supposing hash unique index on path column - the same as previous but index will be selective, i hope. This index type is not very recommended.

        3/ supposing (functional) unique btree index on md5(path):
        SELECT * FROM mytable WHERE md5(path) = md5('/my/path/to/file/isthis.dat');