Ace128 has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks,

I know, I know, this may not be 100% Perl related, but Im doing this in Perl though :) ...

Anyway, I have this db where I save files. That is, things like attributes, size, time and such. At the moment I have the "full path" as key (that is, c:/directory/directory2/file.ext). Now, this is natrually best, since its unique per file. However, it feels to me that this aint the best key to use. (specially if the path is long). So, I was wondering what you guys here had to say about this. Maybe it would be faster (and better) using something like, say, a hash made of the full path and that saved and used in db as key instead?

I need ofcource the full path there somehow, so I can find the file if needed...

Also, key should be VARCHAR and not TEXT?

Thanks,
Ace

Replies are listed 'Best First'.
Re: Key to use for files in db!
by rashley (Scribe) on Dec 29, 2005 at 14:00 UTC
    First of all, by "key", I assume you mean primary key.

    I'm also assuming that there's no guarantee of uniqueness in your filenames, or even directory names.

    If you were confident that the directory2/file.ext combination would be unique for each file, I'd parse those out into two fields, filename and directory, and use the combination as primary key. That way you wouldn't have to use TEXT, which is an understandable desire.

    If not, using the full path is probably your best bet, unless you wanted to just use an index, but that's usually cumbersome.

      Ah, yes, primary key :)
Re: Key to use for files in db!
by turo (Friar) on Dec 29, 2005 at 14:03 UTC

    Obfuscated code embedded on a database table entry?
    ... sounds good :-P

    If i've undestood the problem, i think you can put this:

    perl -MDumpvalue -e 'my $dumper=new Dumpvalue; %hash=eval qq(return (k +ey1,value1,key2,value2)); $dumper->dumpValue(\%hash)'
    In that case, you must store the string 'return (key1,value1,key2,value2)' and then make the eval; in fact something like this:
    use Dumpvalue; use DBI; my $dumper = new Dumpvalue; my $mysql = new DBI(...); my $mysql->conect(); my $hash_string = $mysql->do('select .. from table'); %hash=eval $hash_string; $dumper->dumpValue(\%hash)

    Good Luck!

    turo

    perl -Te 'print map { chr((ord)-((10,20,2,7)[$i++])) } split //,"turo"'
Re: Key to use for files in db!
by TedPride (Priest) on Dec 29, 2005 at 16:48 UTC
    The problem is that searching large numbers of files becomes very inefficient if you have to search a large string or strings. The best way to do this I think is to create a short hash - no more than 3 hex characters - from the full path, then index that and use it as the first part of your search. You might possibly get 2 or 3 files that have the same 3-character key, but these will require very little processing time for checking the full path.
    SELECT FROM mytable WHERE hashkey = '4ea' && path = '/my/path/to/file/ +isthis.dat';
    With hashkey indexed.
    use Digest::MD5 qw(md5 md5_hex md5_base64); my $path = '/my/path/to/file/isthis.dat'; my $hashkey = substr(md5_hex($path), 0, 3);
    EDIT: I should also note here that variable-width fields can't be indexed in their entirety. You have to define a width for your key, which means either leaving out part of some variable-width fields or padding most of them (which defeats the whole purpose of a variable-width field). It's much better to use a CHAR(3) key (see above).
      Alrigthty. I actually had something like this in mind, but put black on white like this makes it seem even better. Thanks! As it is now I got just that $path as VARCHAR(255) and primary key. With your little addon, I suppose that using the $hashkey aswell as CHAR(3), will make the lookup really fast, since the calculation if the hash is faster (?)...
      Imho, your idea is good, but, why code it into perl when database engine is able to support it technically? If you use all features of the specific implementation of indexes, you can force the same effect and your sql queries will stay clear...
      I know that this theme is off topic and I don't want flame, I only advice to observe and to use features of sql engine before modifying perl code.
        Em, how do you mean?
Re: Key to use for files in db!
by pajout (Curate) on Dec 29, 2005 at 15:30 UTC
    If you want to access the rows via full path (you need this column in the table), and need not to reference this table from another one, you can use the full path as primary key. The effectivity of the index and proper column type depends on its implementation. Which database did you mean? PostgreSQL?
      Well, mySQL, SQLite, PostgreSQL.. you name it :)
        No, I named PostgreSQL only :)
        Imho, in PostgreSQL, you should try to create the table with fullpath (columntype TEXT) as the primary key (which forces creating the most common type of index), fill it with real data (minimally 1000 rows), analyze the table and observe the optimizer compliancy to use it when typical queries happens. But, I personally have more conservative opinion, I typically create the integer column as primary key and the fullpath text column with NOT NULL UNIQUE constraint.
        And, after that, if you will not be satisfied, you can obtain more informations about performance tunning at
        http://www.postgresql.org/docs/
        or
        http://archives.postgresql.org/pgsql-performance/

        Good luck on this non-perl specific journey ;)
Re: Key to use for files in db!
by SamCG (Hermit) on Jan 18, 2006 at 20:02 UTC
    Hmmm, looking at your problem, are you sure you want to use the path in the primary key at all? What happens if you move a set of files? Multiple updates on primary keys can be a pain...

    I'd probably go with a unique number, something that I don't need to worry about updating, and then leave the path in a separate column(s).
      Yea, I have it like that now. The unique number is for other tables to use as a foreign key.